WO2023051368A1 - Encoding and decoding method and apparatus, and device, storage medium and computer program product - Google Patents

Encoding and decoding method and apparatus, and device, storage medium and computer program product Download PDF

Info

Publication number
WO2023051368A1
WO2023051368A1 PCT/CN2022/120495 CN2022120495W WO2023051368A1 WO 2023051368 A1 WO2023051368 A1 WO 2023051368A1 CN 2022120495 W CN2022120495 W CN 2022120495W WO 2023051368 A1 WO2023051368 A1 WO 2023051368A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
scheme
current frame
encoding
decoding
Prior art date
Application number
PCT/CN2022/120495
Other languages
French (fr)
Chinese (zh)
Inventor
刘帅
高原
王宾
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023051368A1 publication Critical patent/WO2023051368A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Definitions

  • the embodiments of the present application relate to the technical field of audio processing, and in particular to a codec method, device, equipment, storage medium, and computer program product.
  • HOA Higher order ambisonics
  • One of the schemes is a codec scheme based on directional audio coding (directional audio coding, DirAC).
  • the encoder extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into the code stream.
  • the decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream.
  • Another solution is a codec solution based on virtual speaker selection.
  • the encoder selects the target virtual speaker that matches the HOA signal of the current frame from the virtual speaker set based on the match-projection (MP) algorithm, and determines the virtual speaker based on the HOA signal of the current frame and the target virtual speaker signal, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal, and encode the virtual speaker signal and the residual signal into the code stream.
  • MP match-projection
  • the decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream.
  • the heterogeneous sound source refers to a point sound source with different positions and/or directions of the sound source.
  • the sound field types of different audio frames may be different. If you want to have a higher compression rate for audio frames under different sound field types at the same time, you need to use the sound field type of each audio frame as Select the appropriate codec scheme for the corresponding audio frame, so you need to switch between different codec schemes.
  • HOA signals reconstructed based on different codec schemes have different auditory quality after rendering and playback. When switching between different codec schemes, how to ensure the smooth transition of auditory quality is a problem that needs to be considered at present.
  • Embodiments of the present application provide a codec method, device, device, storage medium, and computer program product, capable of ensuring a smooth transition of auditory quality when switching between different codec schemes. Described technical scheme is as follows:
  • an encoding method which includes:
  • the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme; wherein, the first coding scheme is based on directional audio coding
  • the HOA encoding scheme namely the DirAC decoding scheme
  • the second encoding scheme is the HOA encoding scheme based on virtual speaker selection (which can be referred to simply as the MP-based HOA decoding scheme)
  • the third encoding scheme is a hybrid encoding scheme; if the encoding scheme of the current frame
  • the signal of the specified channel in the HOA signal is encoded into the code stream, and the specified channel is a part of all channels of the HOA signal.
  • the hybrid coding scheme will use both the technical means related to the first coding scheme (ie DirAC coding scheme) and the technical means related to the second coding scheme (MP-based HOA coding scheme) in the coding process, so it is called hybrid encoding scheme.
  • an appropriate codec scheme is selected for different audio frames, which can improve the compression rate of the audio signal.
  • a new codec scheme is used to code and decode these audio frames, that is, the HOA of these audio frames.
  • the signal of the specified channel in the signal is encoded into the code stream, that is, a compromise scheme is used for encoding and decoding, so that the auditory quality after rendering and playback of the decoded and recovered HOA signal can be smoothly transitioned.
  • the signal of the specified channel includes a first-order ambisonics (first-order ambisonics, FOA) signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
  • encoding the signal of the specified channel in the HOA signal into the code stream includes: determining the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal, and the Z signal; encoding the virtual speaker signal and the residual signal input stream.
  • determining the virtual speaker signal and the residual signal includes: determining the W signal as a virtual speaker signal; based on the W signal, the X signal, the Y signal and the Z signal Three paths of residual signals are determined, or the X signal, Y signal and Z signal are determined as three paths of residual signals.
  • the difference signals between the X signal, the Y signal, and the Z signal and the W signal are determined as three-way residual signals.
  • encoding the virtual speaker signal and the residual signal into the code stream includes: combining the virtual speaker signal with the first preset mono signal to obtain a stereo signal; combining the three residual signals It is combined with the second preset mono signal to obtain two stereo signals; the obtained three stereo signals are respectively encoded into the code stream through a stereo encoder.
  • combining the three residual signals with the second preset mono signal to obtain two stereo signals includes: combining the two residual signals with the highest correlation among the three residual signals , to obtain one stereo signal among the two stereo signals; combining one residual signal of the three residual signals except the two residual signals with the highest correlation with the second preset mono signal, In order to obtain the other stereo signal of the two stereo signals.
  • the first preset monophonic signal is an all-zero signal or an all-one signal.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero.
  • the all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
  • encoding the virtual speaker signal and the residual signal into the code stream includes: respectively encoding the virtual speaker signal and the residual signals of the three residual signals into the code stream through a mono encoder flow.
  • the encoding scheme of the current frame after determining the encoding scheme of the current frame according to the HOA signal of the current frame, it further includes: if the encoding scheme of the current frame is the first encoding scheme, encoding the HOA signal into the code stream according to the first encoding scheme; if the current If the encoding scheme of the frame is the second encoding scheme, the HOA signal is encoded into the code stream according to the second encoding scheme.
  • determining the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame includes: determining the initial coding scheme of the current frame according to the HOA signal, and the initial coding scheme is the first coding scheme or the second coding scheme; If the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, then determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame; if the initial encoding scheme of the current frame is the first encoding scheme and the current frame The initial encoding scheme of the previous frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the previous frame of the current frame is the first encoding scheme, then determine the encoding scheme of the current frame is the third encoding scheme.
  • the method further includes: encoding the indication information of the initial encoding scheme of the current frame into a code stream.
  • determining the value of the switching flag of the current frame when the coding scheme of the current frame is the first coding scheme or the second coding scheme scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; the value of the switching flag is encoded into the code stream. That is, a switch flag is used to indicate whether the current frame is a switch frame.
  • the method further includes: encoding the indication information of the coding scheme of the current frame into the code stream.
  • the specified channel is consistent with the preset transmission channel in the first encoding scheme. In this way, it can be ensured that the auditory quality of the switching frame is similar to that of the audio frame encoded by using the first encoding scheme.
  • a decoding method comprising:
  • the decoding scheme of the current frame is obtained based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is high-order stereo based on directional audio decoding Reverberation HOA decoding scheme, the second decoding scheme is the HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; if the decoding scheme of the current frame is the third decoding scheme, the HOA of the current frame is determined based on the code stream
  • the signal of the specified channel in the signal, the specified channel is a part of all channels of the HOA signal; based on the signal of the specified channel, determine the gain of one or more remaining channels in the HOA signal except the specified channel; based on the specified channel
  • the signal of the signal and the gain of the one or more remaining channels determine the signal of each remaining channel in the one or more remaining channels; based on the signal of the specified channel and the signal of the one or
  • determining the signal of the specified channel in the HOA signal of the current frame based on the code stream includes: determining a virtual speaker signal and a residual signal based on the code stream; and determining a signal of the specified channel based on the virtual speaker signal and the residual signal.
  • determining the virtual speaker signal and the residual signal based on the code stream includes: decoding the code stream through a stereo decoder to obtain three stereo signals; based on the three stereo signals, determining one virtual speaker signal and three channels residual signal.
  • determining a virtual speaker signal and three residual signals based on the three stereo signals includes: determining a virtual speaker signal based on a stereo signal in the three stereo signals; The other two stereo signals are used to determine the three residual signals.
  • determining the virtual speaker signal and the residual signal based on the code stream includes: decoding the code stream by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
  • the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals; based on the virtual speaker signal and the residual signal, the specified channel is determined
  • the signal includes: determining W signal based on the virtual speaker signal; determining X signal, Y signal and Z signal based on the residual signal and W signal, or determining X signal, Y signal and Z signal based on the residual signal.
  • obtaining the reconstructed HOA signal of the current frame according to the code stream includes: according to the second decoding scheme, obtaining the initial HOA signal according to the code stream; if the decoding scheme of the previous frame of the current frame is the third In the decoding scheme, gain adjustment is performed on the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame; based on the low-order part of the initial HOA signal and the gain-adjusted high-order part, a reconstructed HOA signal is obtained. That is, through high-order gain adjustments, the auditory quality is further smoothed.
  • obtaining the decoding scheme of the current frame based on the code stream includes: parsing the value of the switching flag of the current frame from the code stream; if the value of the switching flag is the first value, parsing the decoding scheme of the current frame from the code stream Scheme indication information, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; if the value of the switching flag is the second value, it is determined that the decoding scheme of the current frame is the third decoding scheme.
  • obtaining the decoding scheme of the current frame based on the code stream includes: parsing the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme, and the second decoding scheme. scheme or a third decoding scheme.
  • obtaining the decoding scheme of the current frame based on the code stream includes: parsing the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme; if the initial decoding scheme of the current frame The same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame; if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame If the scheme is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then the decoding scheme of the current frame is determined to be the third decoding scheme.
  • an encoding device in a third aspect, is provided, and the encoding device has a function of implementing the behavior of the encoding method in the first aspect above.
  • the encoding device includes one or more modules, and the one or more modules are used to implement the encoding method provided in the first aspect above.
  • an encoding device comprising:
  • the first determination module is used to determine the coding scheme of the current frame according to the high-order ambisonics HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme;
  • the first coding scheme is an HOA coding scheme based on directional audio coding
  • the second coding scheme is an HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme
  • the first encoding module is configured to encode the signal of the specified channel in the HOA signal into the code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.
  • the signal of the designated channel includes a first-order ambisonic reverberation FOA signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
  • the first determination submodule is used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal;
  • the encoding sub-module is used to encode the virtual loudspeaker signal and the residual signal into a code stream.
  • the first determination submodule is used for:
  • the three residual signals are determined based on the W signal, the X signal, the Y signal and the Z signal, or the X signal, the Y signal and the Z signal are determined as the three residual signals.
  • the encoding submodule is used to:
  • the obtained three-way stereo signals are respectively coded into bit streams through a stereo encoder.
  • the encoding submodule is used to:
  • the first preset monophonic signal is an all-zero signal or an all-one signal.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero.
  • the all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
  • the encoding submodule is used to:
  • the one channel of virtual loudspeaker signals and the residual signals of the three channels of residual signals are respectively coded into code streams through a mono encoder.
  • the device also includes:
  • the second encoding module is used to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
  • the third encoding module is configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
  • the first determination module includes:
  • the second determining submodule is used to determine the initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
  • the third determining submodule is used to determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame;
  • the fourth determining submodule is used to determine if the initial encoding scheme of the current frame is the first encoding scheme and the initial encoding scheme of the previous frame of the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and The initial encoding scheme of the frame preceding the current frame is the first encoding scheme, and then it is determined that the encoding scheme of the current frame is the third encoding scheme.
  • the device also includes:
  • the fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
  • the device also includes:
  • the second determination module is used to determine the value of the switching flag of the current frame.
  • the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the first value;
  • the encoding scheme is the third encoding scheme, the value of the switching flag of the current frame is the second value;
  • the fifth encoding module is used to encode the value of the switching flag into the code stream.
  • the device also includes:
  • the sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
  • the specified channel is consistent with the preset transmission channel in the first encoding scheme.
  • a decoding device in a fourth aspect, has the function of realizing the behavior of the decoding method in the second aspect above.
  • the decoding device includes one or more modules, and the one or more modules are used to implement the decoding method provided by the second aspect above.
  • a decoding device which includes:
  • the first obtaining module is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is based on High-order ambisonic reverberation HOA decoding scheme for directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
  • the first determination module is used to determine the signal of the specified channel in the HOA signal of the current frame based on the code stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal;
  • the second determination module is used to determine the gain of one or more remaining channels in the HOA signal except the specified channel based on the signal of the specified channel;
  • a third determination module configured to determine the signal of each of the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels;
  • the second obtaining module is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  • the first determination module includes:
  • a first determining submodule configured to determine a virtual speaker signal and a residual signal based on a code stream
  • the second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
  • the first determination submodule is used for:
  • one virtual speaker signal and three residual signals are determined.
  • the first determination submodule is used for:
  • the first determination submodule is used for:
  • the code stream is decoded by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
  • the signal of the designated channel includes a first-order ambisonic reverberation FOA signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals;
  • the first determined submodule is used for:
  • the X signal, the Y signal and the Z signal are determined based on the residual signal and the W signal, or the X signal, the Y signal and the Z signal are determined based on the residual signal.
  • the device also includes:
  • the first decoding module is used to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
  • the second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
  • the second decoding module includes:
  • the first obtaining submodule is used to obtain the initial HOA signal according to the code stream according to the second decoding scheme
  • the gain adjustment submodule is used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme;
  • the second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
  • the first obtaining module includes:
  • the first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
  • the second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or second decoding scheme;
  • the third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
  • the first obtaining module includes:
  • the third parsing sub-module is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the first obtaining module includes:
  • the fourth parsing submodule is used to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
  • the fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
  • the fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and then it is determined that the decoding scheme of the current frame is the third decoding scheme.
  • an encoding end device includes a processor and a memory, and the memory is used to store a program for executing the encoding method provided in the above first aspect, and to store a program for realizing the above first aspect.
  • the processor is configured to execute programs stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a decoding end device includes a processor and a memory, and the memory is used to store a program for executing the decoding method provided in the above second aspect, and to store a program for implementing the above second The data involved in the decode method provided by the aspect.
  • the processor is configured to execute programs stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, the computer executes the encoding method or the second encoding method described in the first aspect above. The decoding method described in the aspect.
  • the eighth aspect provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the encoding method described in the first aspect or the decoding method described in the second aspect.
  • two schemes i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an implementation environment of a terminal scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an implementation environment of a transcoding scenario of a wireless or core network device provided in an embodiment of the present application;
  • FIG. 4 is a schematic diagram of an implementation environment of a broadcast television scene provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an implementation environment of a virtual reality streaming scene provided by an embodiment of the present application.
  • FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an HOA coding scheme based on virtual speaker selection provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a DirAC-based HOA coding scheme provided by an embodiment of the present application.
  • FIG. 10 is a flow chart of another encoding method provided by the embodiment of the present application.
  • FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a switching frame decoding scheme provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an HOA decoding scheme based on virtual speaker selection provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a DirAC-based HOA decoding scheme provided by an embodiment of the present application.
  • Fig. 15 is a flow chart of another decoding method provided by the embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a decoding device provided by an embodiment of the present application.
  • Fig. 18 is a schematic block diagram of a codec device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes source device 10 , destination device 20 , link 30 and storage device 40 .
  • the source device 10 may generate encoded media data. Therefore, the source device 10 may also be called a media data encoding device.
  • Destination device 20 may decode the encoded media data generated by source device 10 . Accordingly, destination device 20 may also be referred to as a media data decoding device.
  • Link 30 may receive encoded media data generated by source device 10 and may transmit the encoded media data to destination device 20 .
  • the storage device 40 can receive the encoded media data generated by the source device 10, and can store the encoded media data.
  • the destination device 20 can directly obtain the encoded media from the storage device 40.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save encoded media data generated by the source device 10, in which case the destination device 20 may transmit or download the media data from the storage device 40 via streaming or downloading. Stored encoded media data.
  • Both the source device 10 and the destination device 20 may include one or more processors and a memory coupled to the one or more processors, and the memory may include random access memory (random access memory, RAM), read-only memory ( read-only memory, ROM), charged erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), flash memory, can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code.
  • both source device 10 and destination device 20 may include desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart" phones, Televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
  • Link 30 may include one or more media or devices capable of transmitting encoded media data from source device 10 to destination device 20 .
  • link 30 may include one or more communication media that enable source device 10 to transmit encoded media data directly to destination device 20 in real-time.
  • the source device 10 may modulate the encoded media data based on a communication standard, such as a wireless communication protocol, etc., and may send the modulated media data to the destination device 20 .
  • the one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include radio frequency (radio frequency, RF) spectrum or one or more physical transmission lines.
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet), among others.
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., which are not specifically limited in this embodiment of the present application.
  • the storage device 40 may store the received encoded media data sent by the source device 10 , and the destination device 20 may directly acquire the encoded media data from the storage device 40 .
  • the storage device 40 may include any one of a variety of distributed or locally accessed data storage media, for example, any one of the various distributed or locally accessed data storage media may be Hard disk drive, Blu-ray Disc, digital versatile disc (DVD), compact disc read-only memory (CD-ROM), flash memory, volatile or nonvolatile memory, or Any other suitable digital storage medium for storing encoded media data, etc.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may save the encoded media data generated by the source device 10, and the destination device 20 may transmit or download the storage device via streaming or downloading. 40 stored media data.
  • the file server may be any type of server capable of storing encoded media data and sending the encoded media data to destination device 20 .
  • the file server may include a network server, a file transfer protocol (file transfer protocol, FTP) server, a network attached storage (network attached storage, NAS) device, or a local disk drive.
  • Destination device 20 may obtain encoded media data over any standard data connection, including an Internet connection.
  • Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), cable modem, etc.), or is suitable for obtaining encoded data stored on a file server.
  • a wireless channel e.g., a Wi-Fi connection
  • a wired connection e.g., a digital subscriber line (DSL), cable modem, etc.
  • DSL digital subscriber line
  • cable modem etc.
  • the transmission of encoded media data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.
  • the implementation environment shown in Figure 1 is only a possible implementation, and the technology of the embodiment of the present application is not only applicable to the source device 10 shown in Figure 1 that can encode media data, but also can encode the encoded media
  • the destination device 20 for decoding data may also be applicable to other devices capable of encoding media data and decoding encoded media data, which is not specifically limited in this embodiment of the present application.
  • the source device 10 includes a data source 120 , an encoder 100 and an output interface 140 .
  • output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter, where a transmitter may also be referred to as a transmitter.
  • Data source 120 may include an image capture device (e.g., video camera, etc.), an archive containing previously captured media data, a feed interface for receiving media data from a media data content provider, and/or a computer for generating media data graphics system, or a combination of these sources of media data.
  • the data source 120 may send media data to the encoder 100, and the encoder 100 may encode the received media data sent by the data source 120 to obtain encoded media data.
  • An encoder may send encoded media data to an output interface.
  • source device 10 sends the encoded media data directly to destination device 20 via output interface 140 .
  • encoded media data may also be stored on storage device 40 for later retrieval by destination device 20 for decoding and/or display.
  • the destination device 20 includes an input interface 240 , a decoder 200 and a display device 220 .
  • input interface 240 includes a receiver and/or a modem.
  • the input interface 240 can receive the encoded media data via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 can decode the received encoded media data to obtain the decoded media data. media data.
  • the decoder may transmit the decoded media data to the display device 220 .
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . In general, the display device 220 displays the decoded media data.
  • the display device 220 can be any type of display device in various types, for example, the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
  • encoder 100 and decoder 200 may be individually integrated with the encoder and decoder, and may include appropriate multiplexer-demultiplexer (multiplexer-demultiplexer) , MUX-DEMUX) unit or other hardware and software for encoding both audio and video in a common data stream or in separate data streams.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP), if applicable.
  • Each of the encoder 100 and the decoder 200 can be any one of the following circuits: one or more microprocessors, digital signal processing (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques of the embodiments of the present application are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium, and may use one or more processors in hardware The instructions are executed to implement the technology of the embodiments of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of encoder 100 and decoder 200 may be included in one or more encoders or decoders, either of which may be integrated into a combined encoding in a corresponding device Part of a codec/decoder (codec).
  • codec codec/decoder
  • Embodiments of the present application may generally refer to the encoder 100 as “signaling” or “sending” certain information to another device such as the decoder 200 .
  • the term “signaling” or “sending” may generally refer to the transmission of syntax elements and/or other data for decoding compressed media data. This transfer can occur in real time or near real time. Alternatively, this communication may occur after a period of time, such as upon encoding when storing syntax elements in an encoded bitstream to a computer-readable storage medium, which the decoding device may then perform after the syntax elements are stored on this medium The syntax element is retrieved at any time.
  • the encoding and decoding methods provided in the embodiments of the present application can be applied to various scenarios. Next, several scenarios will be introduced by taking the media data to be encoded as an HOA signal as an example.
  • FIG. 2 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a terminal scenario.
  • the implementation environment includes a first terminal 101 and a second terminal 201 , and the first terminal 101 and the second terminal 201 are connected in communication.
  • the communication connection may be a wireless connection or a wired connection, which is not limited in this embodiment of the present application.
  • the first terminal 101 may be a sending end device or a receiving end device.
  • the second terminal 201 may be a receiving end device or a sending end device.
  • the first terminal 101 is a sending end device
  • the second terminal 201 is a receiving end device
  • the first terminal 101 is a receiving end device
  • the second terminal 201 is a sending end device.
  • Both the first terminal 101 and the second terminal 201 include an audio collection module, an audio playback module, an encoder, a decoder, a channel encoding module and a channel decoding module.
  • the encoder is a three-dimensional audio encoder
  • the decoder is a three-dimensional audio decoder.
  • the audio collection module in the first terminal 101 collects the HOA signal and transmits it to the encoder.
  • the encoder encodes the HOA signal using the encoding method provided in the embodiment of the present application.
  • the encoding may be called source encoding. Later, in order to realize the transmission of the HOA signal in the channel, the channel coding module needs to perform channel coding again, and then transmit the encoded code stream in the digital channel through the wireless or wired network communication equipment.
  • the second terminal 201 receives the code stream transmitted in the digital channel through a wireless or wired network communication device, the channel decoding module performs channel decoding on the code stream, and then the decoder decodes the HOA signal by using the decoding method provided in the embodiment of this application, and then passes the audio Playback module to play.
  • the first terminal 101 and the second terminal 201 can be any electronic product that can interact with the user through one or more ways such as keyboard, touch pad, touch screen, remote control, voice interaction or handwriting equipment, etc.,
  • Such as personal computer personal computer, PC
  • mobile phone smart phone
  • personal digital assistant personal digital assistant, PDA
  • wearable device PPC (pocket PC)
  • tablet computer smart car machine, smart TV, smart speaker wait.
  • FIG. 3 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a transcoding scenario of a wireless or core network device.
  • the implementation environment includes a channel decoding module, an audio decoder, an audio encoder and a channel encoding module.
  • the audio encoder is a three-dimensional audio encoder
  • the audio decoder is a three-dimensional audio decoder.
  • the audio decoder may be a decoder using the decoding method provided in the embodiment of the present application, or may be a decoder using other decoding methods.
  • the audio encoder may be an encoder using the encoding method provided by the embodiment of the present application, or may be an encoder using other encoding methods.
  • the audio encoder is a coder using other encoding methods
  • the audio The encoder is an encoder using the encoding method provided by the embodiment of the present application.
  • the audio decoder is a decoder using the decoding method provided by the embodiment of the present application, and the audio encoder is an encoder using other encoding methods.
  • the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use the decoding method provided by the embodiment of the application to perform source decoding, and then the audio encoder is used to encode according to other encoding methods to achieve a
  • the conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
  • the audio decoder is a decoder using other decoding methods
  • the audio encoder is an encoder using the encoding method provided by the embodiment of the present application.
  • the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use other decoding methods to perform source decoding, and then the audio encoder uses the encoding method provided by the embodiment of the application to perform encoding to realize a
  • the conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
  • the wireless device may be a wireless access point, a wireless router, a wireless connector, and the like.
  • a core network device may be a mobility management entity, a gateway, and the like.
  • FIG. 4 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a broadcast television scene.
  • the broadcast TV scene is divided into a live scene and a post-production scene.
  • the implementation environment includes a live program 3D sound production module, a 3D sound encoding module, a set-top box and a speaker group, and the set-top box includes a 3D sound decoding module.
  • the implementation environment includes post-program 3D sound production modules, 3D sound coding modules, network receivers, mobile terminals, earphones, and the like.
  • the three-dimensional sound production module of the live program produces a three-dimensional sound signal (such as an HOA signal), and the three-dimensional sound signal obtains a code stream by applying the encoding method of the embodiment of the application, and the code stream is transmitted to the user side through the radio and television network, and the The 3D sound decoder in the set-top box uses the decoding method provided by the embodiment of the present application to decode the code stream, thereby reconstructing the 3D sound signal, which is played back by the speaker group.
  • a three-dimensional sound signal such as an HOA signal
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
  • the post-program 3D sound production module produces a 3D sound signal, and the 3D sound signal obtains a code stream by applying the encoding method of the embodiment of the application.
  • the acoustic decoder uses the decoding method provided by the embodiment of the present application to decode the code stream, so as to reconstruct the three-dimensional acoustic signal, which is played back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group.
  • the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
  • FIG. 5 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a virtual reality streaming scene.
  • the implementation environment includes an encoding end and a decoding end.
  • the encoding end includes an acquisition module, a preprocessing module, an encoding module, a packaging module and a sending module
  • the decoding end includes an unpacking module, a decoding module, a rendering module and earphones.
  • the acquisition module collects the HOA signal, and then preprocesses the HOA signal through the preprocessing module.
  • the preprocessing operation includes filtering out the low frequency part of the HOA signal, usually using 20Hz or 50Hz as the cut-off point to extract the orientation information in the HOA signal wait.
  • use the encoding module to perform encoding processing using the encoding method provided by the embodiment of the present application. After encoding, use the packing module to pack and send to the decoding end through the sending module.
  • the unpacking module at the decoding end first unpacks, and then uses the decoding method provided by the embodiment of the application to decode through the decoding module, and then performs binaural rendering processing on the decoded signal through the rendering module, and the rendered signal is mapped to the listener's earphones superior.
  • the earphone can be an independent earphone, or an earphone on a virtual reality glasses device.
  • FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application, and the encoding method is applied to an encoding end. Please refer to FIG. 6 , the method includes the following steps.
  • Step 601 Determine the coding scheme of the current frame according to the HOA signal of the current frame.
  • the encoder performs encoding frame by frame.
  • the HOA signal of the audio frame is an audio signal obtained through the HOA acquisition technology.
  • the HOA signal is a scene audio signal and also a three-dimensional audio signal.
  • the HOA signal refers to the audio signal obtained by collecting the sound field where the microphone is located in the space.
  • the collected audio signal is called the original HOA signal.
  • the HOA signal of the audio frame may also be an HOA signal obtained by converting a 3D audio signal in another format. For example, convert a 5.1-channel signal into an HOA signal, or convert a 3D audio signal mixed with a 5.1-channel signal and object audio into an HOA signal.
  • the HOA signal of the audio frame to be encoded is a time-domain signal or a frequency-domain signal, and may include all channels of the HOA signal, or may include some channels of the HOA signal.
  • the order of the HOA signal of the audio frame is 3, the number of channels of the HOA signal is 16, the frame length of the audio frame is 20ms, and the sampling rate is 48KHz, then the HOA signal of the audio frame to be encoded contains 16 channels The signal, each channel contains 960 sampling points.
  • the encoder can down-sample the original HOA signal to obtain the The HOA signal of the audio frame. For example, the encoder performs 1/Q down-sampling on the original HOA signal to reduce the number of sampling points or frequency points of the HOA signal to be encoded. For example, in the embodiment of the present application, each channel of the original HOA signal contains 960 sampling points. After /120 downsampling, each channel of the HOA signal to be encoded contains 8 sampling points.
  • the encoding method of the encoding end is introduced by taking the encoding end encoding the current frame as an example.
  • the current frame is an audio frame to be encoded. That is, the encoding end acquires the HOA signal of the current frame, and encodes the HOA signal of the current frame by using the encoding method provided in the embodiment of the present application.
  • the encoding end first determines the initial encoding scheme of the current frame according to the HOA signal of the current frame, and the initial encoding scheme is the first encoding scheme or the second encoding scheme. The encoding end judges whether the first encoding scheme, the second encoding scheme or the third encoding scheme is used to encode the HOA signal of the current frame by comparing the initial encoding scheme of the current frame with the initial encoding scheme of the previous frame of the current frame. .
  • the encoding end uses the encoding scheme consistent with the initial encoding scheme of the current frame to encode the HOA signal of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the encoding end uses the switching frame coding scheme to encode the HOA signal of the current frame.
  • the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme.
  • the first coding scheme is a DirAC-based HOA coding scheme
  • the second coding scheme is an HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme.
  • the hybrid coding scheme is also referred to as a switched frame coding scheme.
  • the third coding scheme is a switching frame coding scheme provided by the embodiment of the present application, and the third coding scheme is for smooth transition of auditory quality when switching between different codec schemes.
  • the HOA coding scheme based on virtual speaker selection is also referred to as the MP-based HOA coding scheme.
  • the coding end determines the initial coding scheme of the current frame according to the HOA signal of the current frame. Then, the encoding end determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame. It should be noted that this embodiment of the present application does not limit the implementation manner in which the encoding end determines the initial encoding scheme.
  • the coding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame, and determines the initial coding scheme of the current frame based on the sound field classification result of the current frame.
  • the embodiment of the present application does not limit the method of sound field type analysis, for example, the encoding end performs singular value decomposition on the HOA signal of the current frame to perform sound field type analysis, or performs other linear decomposition on the HOA signal to perform sound field analysis. type analysis.
  • the sound field classification result includes the number of distinct sound sources.
  • the encoding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame: the encoding end analyzes the current frame Singular value decomposition is performed on the HOA signal to obtain M singular values.
  • the encoding end determines the number of different sound sources corresponding to the current frame based on the M-1 sound field classification parameters.
  • the encoder determines the number of dissimilar sound sources of the current frame
  • the initial encoding scheme is the second encoding scheme. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the initial encoding scheme of the current frame is the first encoding scheme.
  • the first threshold is smaller than the second threshold.
  • the first threshold is 0 or other values
  • the second threshold is 3 or other values.
  • the aforementioned first threshold and second threshold are preset values, which can be preset based on experience or through statistics.
  • the sound field classification result includes sound field types, and the sound field types are divided into diffuse sound fields and heterogeneous sound fields.
  • the sound field type may be determined according to the number of distinct sound sources obtained by the foregoing method, that is, the encoder determines the sound field type of the current frame based on the number of distinct sound sources corresponding to the current frame. For example, if the number of distinct sound sources corresponding to the current frame is greater than the first threshold and smaller than the second threshold, the encoder determines that the sound field type of the current frame is a distinct sound field. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the sound field type of the current frame is a diffuse sound field.
  • the encoder determines that the initial encoding scheme of the current frame is the second encoding scheme, that is, the MP-based HOA encoding scheme. If the sound field type of the current frame is a diffuse sound field type, the encoding end determines that the initial encoding scheme of the current frame is the first encoding scheme, that is, the HOA encoding scheme based on DirAC.
  • the initial encoding scheme of each audio frame may be switched back and forth, that is, there are more switching frames that need to be encoded in the end . Since there are many problems caused by the switching between encoding schemes, that is, there are many problems to be solved, the problems caused by the switching can be reduced by reducing the number of switching frames.
  • the encoding end can first determine the expected encoding scheme of the current frame according to the sound field classification result of the current frame, that is, the encoding end uses the initial encoding scheme determined according to the aforementioned method as the expected encoding scheme. Then, the encoding end uses a sliding window method to update the initial encoding scheme of the current frame based on the expected encoding scheme, for example, the encoding end updates the initial encoding scheme of the current frame through hangover processing.
  • the sliding window includes the predicted coding scheme of the current frame and the updated initial coding scheme of the previous N ⁇ 1 frames of the current frame. If the cumulative number of second coding schemes in the sliding window is not less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme. If the cumulative number of second coding schemes in the sliding window is less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the first coding scheme.
  • the length N of the sliding window is 8, 10, 15, etc.
  • the first specified threshold is 5, 6, 7, etc. The embodiment of the present application does not limit the length of the sliding window and the value of the first specified threshold.
  • An example is as follows, assuming that the length of the sliding window is 10, the first specified threshold is 7, and the sliding window contains the predicted coding scheme of the current frame and the updated initial coding scheme of the first 9 frames of the current frame. If the second When the number of coding schemes accumulates to no less than 7, the encoding end determines the initial encoding scheme of the current frame as the second encoding scheme; if the number of second encoding schemes in the sliding window accumulates to less than 7, the encoding end determines The initial encoding scheme is updated to the first encoding scheme.
  • the encoder updates the initial coding scheme of the current frame to the first coding scheme. If the cumulative number of the first coding scheme in the sliding window is less than the second specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme.
  • the second designated threshold value is 5, 6, 7 and other values, and the embodiment of the present application does not limit the value of the second designated threshold value.
  • the second specified threshold is different from or the same as the above-mentioned first specified threshold.
  • the encoder can also use other methods to obtain the sound field classification result of the current frame, and other methods can also be used to determine the initial coding scheme based on the sound field classification result. Not limited.
  • the encoding end determines the initial encoding scheme of the current frame, if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, the encoding end determines that the encoding scheme of the current frame is the current frame initial encoding scheme. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the frame preceding the current frame, the encoder determines that the encoding scheme of the current frame is the third encoding scheme.
  • the encoder determines that the coding scheme of the current frame is the first coding scheme. If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the second coding scheme, the encoder determines that the coding scheme of the current frame is the second coding scheme. If one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme, and the other is the second coding scheme, the encoder determines that the coding scheme of the current frame is the third coding scheme.
  • one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme
  • the other is the second coding scheme
  • the initial coding scheme of the current frame is the first coding scheme
  • the initial encoding scheme of the frame preceding the current frame is the second encoding scheme
  • the initial encoding scheme of the current frame is the second encoding scheme
  • the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the frame preceding the current frame is the first encoding scheme. That is, for the switching frame, the encoding end neither adopts the first encoding scheme nor the second encoding scheme to encode the HOA signal of the switching frame, but uses the switching frame encoding scheme to encode the HOA signal of the switching frame.
  • the coding end will use a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame.
  • a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame.
  • an audio frame whose initial coding scheme is different from that of the previous frame is a switching frame
  • an audio frame whose initial coding scheme is the same as that of the previous frame is a non-switching frame.
  • the encoding end in addition to determining the encoding scheme of the current frame, the encoding end also needs to encode information that can indicate the encoding scheme of the current frame into the code stream, so that the decoding end can determine which decoding scheme to use to decode the code stream of the current frame .
  • the encoding end there are many ways for the encoding end to encode information capable of indicating the encoding scheme of the current frame into the code stream, and three implementation ways will be introduced next.
  • the first implementation the code switching flag and the indication information of the two coding schemes
  • the indication information of the initial coding scheme is represented by a coding mode (coding mode) corresponding to the initial coding scheme, that is, the coding mode is used as the indication information.
  • the encoding mode corresponding to the initial encoding scheme is the initial encoding mode
  • the initial encoding mode is the first encoding mode (ie, the DirAC mode) or the second encoding mode (ie, the MP mode).
  • the preset indication information is a preset encoding mode
  • the preset encoding mode is a first encoding mode or a second encoding mode.
  • the preset indication information is other coding modes, that is, the specific indication information of the coding scheme of the switching frame encoded into the code stream is not limited.
  • the encoding end uses the switching flag to indicate the switching frame
  • the indication information of the coding scheme of the switching frame encoded into the code stream may not be limited, and the indication information of the coding scheme of the switching frame may be It may be an initial encoding mode, may also be a preset encoding mode, may also be randomly selected from the first encoding mode and the second encoding mode, or may be other indication information.
  • the switching flag is used to indicate whether the current frame is a switching frame, so that the decoder can directly determine whether the current frame is a switching frame by obtaining the switching flag in the code stream.
  • the switching flag of the current frame and the indication information of the initial coding scheme each occupy one bit of the code stream.
  • the value of the switching flag of the current frame is "0" or "1", wherein the value of the switching flag is "0" indicating that the current frame is not a switching frame, that is, the value of the switching flag of the current frame is the first value.
  • the switching flag being "1" indicates that the current frame is a switching frame, that is, the value of the switching flag of the current frame is the second value.
  • the indication information of the initial encoding scheme is “0” or “1", wherein “0” indicates the DirAC mode (ie, the DirAC encoding scheme), and “1” indicates the MP mode (ie, the MP-based encoding scheme).
  • the encoding end determines that the value of the switching flag of the current frame is the second value, and sets the value of the switching flag of the current frame to The value is encoded into the codestream. That is, for the switching frame, since the switching flag in the code stream can indicate the switching frame, there is no need to encode the indication information of the coding scheme of the switching frame.
  • the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream.
  • the indication information encoded into the code stream is substantially the coding mode consistent with the initial coding scheme, that is, the initial coding mode, and the initial coding mode is the first coding mode or the second coding mode.
  • the encoding end may not encode the switching flag.
  • the indication information of the initial encoding scheme occupies one bit of the code stream.
  • the coding mode coded into the code stream is "0" or "1", where "0" indicates the DirAC mode, indicating that the initial coding scheme of the current frame is the first coding scheme , "1" indicates MP mode, indicating that the initial encoding scheme of the current frame is the second encoding scheme.
  • the third implementation mode encoding the indication information of the three encoding schemes
  • the indication information of the coding scheme of the current frame occupies two bits of the code stream.
  • the indication information of the coding scheme of the current frame is "00", “01” or “10".
  • "00" indicates that the encoding scheme of the current frame is the first encoding scheme
  • "01” indicates that the encoding scheme of the current frame is the second encoding scheme
  • "10" indicates that the encoding scheme of the current frame is the third encoding scheme.
  • the encoding end determines the value of the switching flag, and encodes the value of the switching flag into the code stream.
  • the instruction information of the initial encoding scheme of the current frame is encoded into the code stream, or, if the current frame is a switching frame, the encoder encodes the preset instruction information into the code stream, and if the current frame is a non-switching frame, the encoding end Encode the indication information of the initial coding scheme of the current frame into the code stream.
  • the encoder after determining the initial encoding scheme of the current frame, directly encodes the indication information of the initial encoding scheme of the current frame into the code stream.
  • the encoding end determines the initial encoding scheme of the current frame, it determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame, and converts the encoding scheme of the current frame to Instructions for encoding schemes are encoded into the bitstream.
  • Step 602 If the coding scheme of the current frame is the third coding scheme, code the signal of the designated channel in the HOA signal into the code stream, and the designated channel is a part of all the channels of the HOA signal.
  • the encoding end encodes the HOA signal of the current frame according to the third encoding scheme (ie, the hybrid encoding scheme).
  • the value of the switching flag of the current frame is the second value, it indicates that the current frame is a switching frame.
  • the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, it means that the current frame is a switching frame.
  • the coding scheme of the current frame indicates that the current frame is a switching frame.
  • the encoding end adopts the third encoding scheme to encode the HOA signal of the current frame.
  • the third coding scheme indicates to code the signal of the specified channel in the HOA signal of the current frame into the code stream, wherein the specified channel is a part of all channels of the HOA signal.
  • the encoder encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream instead of using the first coding scheme or the second coding scheme to encode the switching frame, that is, this scheme is for Smooth transition of auditory quality when coding schemes are switched, using a compromise method to encode switching frames.
  • the designated channel is consistent with a preset transmission channel in the first encoding scheme, that is, the designated channel is a preset channel. That is to say, under the premise that the third coding scheme is different from the second coding scheme, in order to make the coding effect of the third coding scheme and the second coding scheme close, the coding end will switch the HOA signal of the frame and the first coding scheme
  • the signal of the same channel as the preset transmission channel is encoded into the code stream, so that the auditory quality can be as smooth as possible.
  • different transmission channels can be preset according to different encoding bandwidths, bit rates, and even application scenarios.
  • the preset transmission channels may also be the same.
  • the signals of the specified channel include FOA signals, and the FOA signals include omnidirectional W signals, and directional X signals, Y signals, and Z signals. That is to say, the specified channel includes the FOA channel, and the signal of the FOA channel is a low-order signal, that is, if the current frame is a switching frame, the encoding end encodes the low-order part of the HOA signal of the current frame into the code stream, and the low-order part is Including W signal, X signal, Y signal and Z signal of FOA channel.
  • the encoding end determines the virtual speaker signal and the residual signal based on the W signal, X signal, Y signal, and Z signal, and encodes the virtual speaker signal and the residual signal into the code flow.
  • the encoder determines the W signal as one virtual speaker signal, determines three residual signals based on the W signal, X signal, Y signal and Z signal, or determines the X signal, Y signal and Z signal as three channels residual signal.
  • the encoding end determines the difference signal between any three signals of the W signal, the X signal, the Y signal, and the Z signal and the remaining signal as the three residual signals.
  • the encoding end determines the difference signals between the X signal, the Y signal, and the Z signal and the W signal as three residual signals.
  • the encoding end uses the difference signals X', Y', and Z' respectively obtained by X-W, Y-W, and Z-W as three-way residual signals.
  • the encoder uses the core encoder to encode the current frame, and the core encoder is a stereo encoder, since the determined one-way virtual speaker signal and three-way residual signals are all mono signals, the encoder needs to first base on these Mono signals are combined to form stereo signals, which are then encoded using a stereo encoder.
  • the encoding end combines the one virtual speaker signal with the first preset mono signal to obtain a stereo signal, and combines the three residual signals with the second preset mono signal to obtain Get two stereo signals.
  • the encoding end encodes the obtained three-way stereo signals into code streams respectively through a stereo encoder.
  • the embodiment of the present application does not limit the encoding end to combine the three residual signals and one preset mono signal to obtain a specific combination method of two stereo signals.
  • the encoding end combines the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals, and divides the three residual signals by dividing One residual signal other than the two residual signals is combined with the second preset mono signal to obtain another stereo signal among the two stereo signals. That is to say, the encoding end combines signals according to correlation to obtain stereo signals.
  • the encoding end may also combine any two residual signals of the three residual signals to obtain one stereo signal among the two stereo signals, and combine the remaining one residual signal with the second Combine the preset mono signals to obtain the other stereo signal of the two stereo signals.
  • the first preset monophonic signal in the embodiment of the present application is an all-zero signal or an all-ones signal
  • the second preset monophonic signal is an all-zero signal or an all-ones signal
  • the first preset mono signal is the same as or different from the second preset mono signal, that is, the first preset mono signal and the second preset mono signal are both All zeros or all ones, or, the first preset mono signal is all zeros and the second preset mono signal is all ones, or, the first preset mono signal is All ones and the second preset mono signal is all zeros.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point value is all zero, and the all-one signal includes a signal whose sampling point value is all one or a signal whose frequency point value is all one.
  • the all-zero signal includes a signal whose sampling point values are all zero, and the all-ones signal includes a signal whose sampling point value is all one.
  • the HOA signal is a frequency-domain signal
  • the all-zero signal includes a signal whose frequency point values are all zero
  • the all-ones signal includes a signal whose frequency point value is all one.
  • the first preset mono signal and/or the second preset mono signal may also be preset signals in other forms.
  • the encoding end uses the mono encoder to encode the one virtual speaker signal and each residual signal of the three residual signals into the code stream respectively .
  • Fig. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application.
  • the current frame to be encoded is a switching frame
  • the encoding end obtains the HOA signal of the current frame, uses the W signal in the HOA signal as a virtual speaker signal, and determines the residual signal according to the FOA signal in the HOA signal, as shown in
  • the residual signal is determined according to the X, Y, and Z signals in the HOA signal, or the residual signal is determined according to the W signal and the X, Y, and Z signals.
  • the encoding end encodes the determined virtual speaker signal and residual signal into the code stream through the core encoder, so as to obtain the code stream of the switching frame.
  • the encoding end determines two channels of signals among W signal, X signal, Y signal and Z signal as two channels of virtual speaker signals, and determines the remaining two channels of signals as two channels of residual signals.
  • the encoding end combines the two channels of virtual speaker signals to obtain one channel of stereo signals, and combines the two channels of residual signals to obtain another channel of stereo signals.
  • the encoding end encodes the obtained two-way stereo signals into code streams respectively through a stereo encoder.
  • the embodiment of the present application does not limit the specific combination manner in which the encoding end combines the W signal, the X signal, the Y signal, and the Z signal in pairs to obtain two stereo signals.
  • the encoding end determines the W signal as a virtual speaker signal, and determines the signal of the highest correlation with the W signal among the X signal, Y signal, and Z signal as another virtual speaker signal, that is, the four channels included in the FOA channel. Combine the W signal and the signal with the highest correlation with the W signal among the two signals, and combine the remaining two signals.
  • the encoding end combines any two signals of W signal, X signal, Y signal and Z signal to obtain one stereo signal, and combines the remaining two signals to obtain another stereo signal.
  • the embodiment of the present application does not limit the specific implementation manner in which the encoding end uses the core encoder to encode the virtual speaker signal and the residual signal, for example, does not limit the number of encoding bits corresponding to the virtual speaker signal and the residual signal.
  • the above describes the process of encoding the current frame at the encoding end when the current frame is a switching frame, that is, the encoding end encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream according to the third encoding scheme.
  • the third encoding scheme That is, switch the frame encoding scheme.
  • the signal of the specified channel may include the W signal, which is a core signal of the HOA signal.
  • the switching frame coding scheme can also be called an MP-W-based coding scheme.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme. If the encoding scheme of the current frame is the second encoding scheme, the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme. That is, if the current frame is not a switching frame, the encoding end uses the initial encoding scheme of the current frame to encode the current frame.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme: the encoding end selects a target that matches the HOA signal of the current frame from the virtual speaker set based on the MP algorithm.
  • Virtual speaker based on the HOA signal of the current frame and the target virtual speaker, determine the virtual speaker signal through the MP-based spatial encoder, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal through the MP-based spatial encoder, through the core
  • the encoder encodes the virtual loudspeaker signal and the residual signal into the bitstream.
  • the encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme: the encoding end extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into stream.
  • the encoding end extracts the core layer signal from the HOA signal of the current frame through the core encoded signal acquisition module, extracts the spatial parameters from the HOA signal of the current frame through the DirAC-based spatial parameter extraction module, and extracts the spatial parameters from the HOA signal of the current frame through the core
  • the encoder encodes the core layer signal into the bit stream, and the spatial parameter into the bit stream through the spatial parameter encoder.
  • the channel corresponding to the core layer signal is consistent with the specified channel in this solution.
  • the extracted spatial parameters are also encoded into the code stream.
  • the spatial parameters include rich scene information, such as direction information. It can be seen that, for the same frame, the effective information encoded into the code stream by using the DirAC-based HOA coding scheme will be more than the effective information encoded into the code stream by using the switching frame coding scheme.
  • the switching frame coding scheme also encodes the signal of the transmission channel preset by the first coding scheme in the HOA signal into the code stream , but it will not encode more information in the HOA signal except the signal of the specified channel into the code stream, that is, the spatial parameters will not be extracted, and the spatial parameters will not be encoded into the code stream, so that the auditory quality is as good as possible Smooth transition.
  • FIG. 10 is a flow chart of another encoding method provided by the embodiment of the present application.
  • the encoder first acquires the HOA signal of the current frame to be encoded. Then, the encoding end analyzes the sound field type of the HOA signal to determine the initial encoding scheme of the current frame, and the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream. The encoder determines whether the initial encoding scheme of the current frame is the same as that of the previous frame.
  • the encoding end uses the initial encoding scheme of the current frame to encode the HOA signal of the current frame to obtain the code stream of the current frame. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the previous frame, the encoding end uses the switching frame encoding scheme to encode the HOA signal of the current frame to obtain the code stream of the current frame.
  • the initial encoding scheme of the current frame is the first encoding scheme or the second encoding scheme
  • the encoder adopts the initial encoding scheme of the current frame to convert the HOA of the current frame
  • the signal is encoded into the bitstream.
  • the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different
  • the audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal.
  • either of the above two schemes is not directly used for encoding, but one of the above two schemes is used.
  • a new codec scheme is used to code and decode these audio frames, that is, to encode the signal of the specified channel in the HOA signal of these audio frames into the code stream, that is, to use a compromise scheme for codec, so that the HOA signal recovered by decoding The aural quality after rendered playback is smooth.
  • FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application, and the method is applied to a decoding end. It should be noted that this decoding method corresponds to the encoding method shown in FIG. 6 . Please refer to FIG. 11 , the method includes the following steps.
  • Step 1101 Obtain the decoding scheme of the current frame based on the code stream.
  • the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme.
  • the first decoding scheme is an HOA decoding scheme based on DirAC
  • the second decoding scheme is an HOA decoding scheme based on virtual speaker selection
  • the third decoding scheme is a hybrid decoding scheme.
  • the hybrid decoding scheme is also referred to as a switching frame decoding scheme.
  • the decoding end since the encoding end uses different encoding schemes for encoding different audio frames, the decoding end also needs to use a corresponding decoding scheme to decode each audio frame.
  • step 601 of the encoding method shown in FIG. 6 three implementations are introduced in which the encoding end encodes information that can be used to indicate the encoding scheme of the current frame into the code stream.
  • the decoding end determines the current frame's
  • the encoding scheme which will be introduced next.
  • the first implementation mode encoding the switching flag and the indication information of the two encoding schemes
  • the decoder first parses out the value of the switching flag of the current frame from the code stream. If the value of the switching flag is the first value, the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme. decoding scheme. If the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is the third decoding scheme. It should be noted that the indication information of the encoding scheme encoded into the code stream by the encoding end is the indication information of the decoding scheme parsed from the code stream by the decoding end.
  • the decoding end parses out that the value of the switching flag of the current frame is the first value, it means that the current frame is a non-switching frame.
  • the decoding end then parses out the indication information of the decoding scheme from the code stream, and determines the decoding scheme of the current frame based on the indication information. If the decoding end parses out that the value of the switching flag of the current frame is the second value, it means that the current frame is a switching frame, and even if the code stream contains the indication information, the decoding end does not need to decode the indication information.
  • the decoding end determines that the decoding scheme of the current frame is a switching frame decoding scheme, and the current frame is a switching frame, and the switching frame decoding scheme is different from the first decoding scheme and the second decoding scheme.
  • the decoding scheme of the two-decoding scheme, the switching frame decoding scheme is for smooth transition of auditory quality.
  • the indication information of the decoding scheme and the switching flag each occupy one bit of the code stream.
  • the decoder first parses the value of the switching flag of the current frame from the code stream. If the parsed value of the switching flag is "0", that is, the value of the switching flag is the first value, the decoding end then analyzes the value of the switching flag from the code stream. The indication information of the decoding scheme of the current frame is analyzed in the middle, and if the indication information analyzed is "0", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed switching flag is a value of "1”, the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme (the third decoding scheme).
  • the second implementation mode encodes the indication information of two encoding schemes
  • the decoding end parses out the initial decoding scheme of the current frame from the code stream, and the initial decoding scheme is the first decoding scheme or the second decoding scheme. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is a third decoding scheme, that is, a hybrid decoding scheme.
  • the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame means that the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme , or, the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme. That is, one of the initial decoding scheme of the current frame and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and the other is the second decoding scheme.
  • the indication information used to indicate the initial encoding scheme occupies one bit of the code stream, and taking the encoding mode as the indication information as an example, the encoding mode in the code stream occupies one bit.
  • the decoding end parses the indication information of the initial encoding scheme of the current frame from the code stream, if the parsed indication information is "0", and the indication information of the previous frame of the current frame is also "0", then decoding The terminal determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1" and the indication information of the previous frame of the current frame is also "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme.
  • the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
  • the indication information of the initial decoding scheme of the previous frame of the current frame is cached data.
  • the decoding end may acquire the indication information of the initial decoding scheme of the previous frame of the current frame from the cache.
  • the third implementation method encodes the indication information of three encoding schemes
  • the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the indication information of the decoding scheme occupies two bits of the code stream.
  • the coding mode of the current frame occupies two bits of the code stream.
  • the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and if the parsed indication information is "00", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "01”, the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed indication information is "10”, the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
  • Step 1102 If the decoding scheme of the current frame is the third decoding scheme, determine the signal of the specified channel in the HOA signal of the current frame based on the code stream, and the specified channel is a part of all channels of the HOA signal.
  • the decoding end determines the current frame based on the code stream specified in the HOA signal channel signal. That is to say, for the switching frame, the encoding end encodes the signal of the specified channel into the code stream, and then the decoding end uses the switching frame decoding scheme to decode the switching frame, that is, the signal of the specified channel needs to be parsed from the code stream first.
  • the decoding end determines the signal of the specified channel in the HOA signal of the current frame based on the code stream realization process.
  • the process of the decoding end determining the signal of the specified channel in the HOA signal of the current frame based on the code stream is symmetrical to the process of encoding the signal of the specified channel in the HOA signal of the current frame into the code stream at the encoding end.
  • some implementation processes of encoding the signal of the specified channel into the code stream are introduced, and the decoding process corresponding to these implementation processes will be introduced at the decoding end.
  • the encoding end first determines the virtual speaker signal and the residual signal based on the signal of the specified channel, and then encodes the virtual speaker signal and the residual signal into the code stream, then, correspondingly, the decoding end first The virtual speaker signal and the residual signal are determined based on the code stream, and then the signal of the specified channel is determined based on the virtual speaker signal and the residual signal.
  • the decoding end decodes the code stream through the stereo decoder to obtain three Stereo signals, and then based on the three stereo signals, one virtual speaker signal and three residual signals are determined.
  • the decoder determines one virtual speaker signal based on one of the three stereo signals, and determines three residual signals based on the other two of the three stereo signals. That is, the decoder first parses the three stereo signals from the code stream, and then disassembles the three stereo signals to obtain a virtual speaker signal and three residual signals.
  • the decoding end parses three stereo signals from the code stream as S1, S2, and S3, wherein S1 is obtained by combining a virtual speaker signal and a preset mono signal, and S2 is obtained by combining two residual signals The signals are combined, and S3 is obtained by combining the remaining one residual signal and one preset mono signal.
  • the decoder disassembles S1 to obtain one virtual speaker signal, disassembles S2 to obtain two residual signals, and disassembles S3 to obtain the remaining one residual signal.
  • the decoding end uses the mono decoder to process the code stream decoding to obtain one virtual speaker signal and three residual signals, and the four monophonic signals include the one virtual speaker signal and the three residual signals.
  • the decoding end determines the virtual speaker signal and the residual signal based on the code stream Then, based on the virtual speaker signal, the W signal is determined.
  • the decoding end determines the X signal, the Y signal and the Z signal based on the residual signal and the W signal, or the decoding end determines the X signal, the Y signal and the Z signal based on the residual signal.
  • the decoding end parses three residual signals
  • the sum of the three residual signals and the W signal is determined as the X signal, the Y signal, and the Z signal, or the three residual signals are respectively determined as For X signal, Y signal and Z signal.
  • the decoding end determines the difference signals between the X signal, the Y signal and the Z signal and the W signal as three residual signals
  • the decoding end determines the sum of the three residual signals and the W signal as X signal, Y signal and Z signal.
  • the decoding end determines the X signal, the Y signal and the Z signal as three residual signals
  • the decoding end determines the three residual signals as the X signal, the Y signal and the Z signal respectively. That is, the decoding process at the decoding end matches the encoding process at the encoding end.
  • the decoding end decodes the code stream through the stereo decoder to obtain the two stereo signals.
  • the decoder determines two channels of virtual speaker signals based on one of the two channels of stereo signals, and determines two channels of residual signals based on the other channel of the two channels of stereo signals.
  • the two channels of virtual speaker signals and the two channels of residual signals The difference signal includes W signal, X signal, Y signal and Z signal.
  • the two virtual speaker signals determined by the decoding end include W signal and the signal with the highest correlation with W signal among X signal, Y signal and Z signal.
  • the signal with the highest correlation with the W signal among the X signal, Y signal, and Z signal is the X signal
  • the two virtual speaker signals determined by the decoder include the W signal and the X signal
  • the two residual signals determined by the decoder include Y signal and Z signal.
  • Step 1103 Based on the signal of the designated channel, determine the gain of one or more remaining channels in the HOA signal of the current frame except the designated channel.
  • the decoder determines the signal of the specified channel in the HOA signal of the current frame based on the code stream, based on the signal of the specified channel, it determines the signals of one or more remaining channels in the HOA signal except for the specified channel. gain.
  • the FOA channel may be called a low-order channel
  • the signal of the FOA channel may be called a low-order part of the HOA signal
  • one or more remaining channels in the HOA signal other than the specified channel are called
  • the signal of the high-order channel can be called the high-order part of the HOA signal.
  • the decoder determines the high-order gain of the HOA signal based on the low-order part of the HOA signal, that is, the gain of the high-order channel.
  • the decoding end first performs analysis and filtering on the signal of the specified channel in the HOA signal to obtain the signal of the specified channel after analysis and filtering, and determines the signal of the one or more remaining channels based on the signal of the specified channel after analysis and filtering. gain. For example, assuming that the signal of the specified channel is the low-order part of the HOA signal, the decoder first performs analysis and filtering on the low-order part of the HOA signal to obtain the low-order part of the analyzed and filtered HOA signal, and then based on the analysis and filtering The low-order part of the HOA signal estimates the high-order gain.
  • the analysis filter used by the decoding end for analysis and filtering is the same as the analysis filter used in the DirAC-based HOA decoding solution, which can make the decoding delay of the switching frame It is consistent with the decoding delay of the DirAC-based HOA decoding scheme, that is, delay alignment.
  • the decoding delay mentioned in this article refers to the end-to-end codec delay, and the decoding delay may also be referred to as encoding delay.
  • the decoding end determines the gain of one or more remaining channels in the HOA signal other than the designated channel based on the signal of the designated channel, that is, estimates the residual gain based on the signal of the designated channel.
  • the specific implementation of the channel gain process is the same as the remaining channel gain estimation method in the DirAC-based codec solution, which is not described in detail in the embodiment of the present application.
  • the method for estimating the high-order gain based on the low-order part of the HOA signal at the decoding end is the same as the method for estimating the high-order gain in the codec solution based on DirAC.
  • Step 1104 Based on the signal of the specified channel and the gain of the one or more remaining channels, determine the signal of each remaining channel in the one or more remaining channels.
  • the decoding end determines the signal of each remaining channel in the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels.
  • the decoding end can base on the W signal in the low-order part and the high-order Gain, which determines the higher order components in the HOA signal.
  • the decoding end can determine the HOA signal after analysis and filtering based on the W signal and the high-order gain in the low-order part of the HOA signal after analysis and filtering. advanced part.
  • Step 1105 Obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  • the decoder after obtaining the signal of the specified channel and the signal of the one or more remaining channels, obtains the reconstructed HOA of the current frame based on the signal of the specified channel and the signal of the one or more remaining channels Signal, that is, to reconstruct the HOA signal of the current frame.
  • the decoding end performs synthesis filtering processing on the signal of the designated channel and the signals of the one or more remaining channels, so as to obtain the reconstructed HOA signal of the current frame.
  • the decoding end can compare the low-order part and the high-order part of the HOA signal Synthetic filtering is performed to obtain the reconstructed HOA signal of the current frame.
  • the decoding end performs analysis filtering on the low-order part of the HOA signal
  • the decoding end performs synthesis filtering on the low-order part of the HOA signal analyzed and filtered and the high-order part of the HOA signal analyzed and filtered to obtain The reconstructed HOA signal for the current frame.
  • the synthesis filter used by the decoding end to perform synthesis filtering processing is the same as the synthesis filter used in the DirAC-based HOA codec scheme, which can make the decoding of the switching frame
  • the delay is consistent with the decoding delay of the DirAC-based HOA decoding scheme, that is, the delay is aligned.
  • Fig. 12 is a schematic diagram of a switching frame decoding solution provided by an embodiment of the present application.
  • the current frame to be decoded is a switching frame, assuming that the signal of the specified channel is the low-order part of the HOA signal, then, during the decoding process, the decoding end obtains the code stream of the current frame to be decoded, and the The core decoding of the code stream is used to reconstruct the low-order part of the HOA signal of the current frame, and a method similar to that of determining the high-order part in the DirAC-based HOA decoding scheme is used to estimate the high-order part based on the low-order part. That is, the higher order part of the HOA signal is reconstructed. Afterwards, the decoding end reconstructs the HOA signal based on the low-order part obtained through decoding and the high-order part obtained through estimation.
  • the above describes the process of decoding the current frame at the decoding end when the current frame is a switching frame, that is, the decoding end uses the switching frame decoding scheme to decode the switching frame, that is, the decoding end first decodes the signal of the specified channel in the HOA signal (such as low-order part), and then reconstruct the signal of each remaining channel (such as reconstructing the high-order part).
  • the process of decoding the current frame at the decoding end will be introduced.
  • the decoding end determines the decoding scheme of the current frame, if the decoding scheme of the current frame is the first decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme. If the decoding scheme of the current frame is the second decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme.
  • the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme: the decoding end parses the virtual speaker signal from the code stream through the core decoder and the residual signal, the parsed virtual speaker signal and residual signal are sent to the MP-based spatial decoder to obtain the reconstructed HOA signal of the current frame.
  • the decoding scheme shown in FIG. 13 corresponds to the encoding scheme shown in FIG. 8 .
  • the realization process of obtaining the reconstructed HOA signal of the current frame according to the code stream is as follows: the decoder parses the core layer signal and spatial parameters from the code stream, and reconstructs the current frame based on the core layer signal and spatial parameters HOA signal.
  • the decoding end parses the core layer signal from the code stream through the core decoder, and parses the spatial parameters from the code stream through the spatial parameter decoder, and performs based on the parsed core layer signal and spatial parameters. DirAC's HOA signal synthesis processing to obtain the reconstructed HOA signal of the current frame.
  • the decoding scheme shown in FIG. 14 corresponds to the encoding scheme shown in FIG. 9 .
  • the decoding end obtains the current
  • gain adjustment may also be performed on the high-order part of the current frame.
  • the decoding end obtains the initial HOA signal according to the code stream according to the second decoding scheme. Higher-order gain of the previous frame of the frame, which performs gain adjustment on the higher-order part of the initial HOA signal.
  • the decoder obtains the reconstructed HOA signal of the current frame based on the low-order part of the original HOA signal and the high-order part after gain adjustment.
  • the current frame uses the high-order gain of the previous frame to perform gain adjustment on the high-order part of the initial HOA signal of the current frame, so that the gain-adjusted
  • the high-order part of is similar to the high-order part of the previous frame, for example, the gain adjustment makes the energy of the high-order part of the HOA signals in two adjacent frames similar. In this way, when the subsequent decoding end renders and plays each audio frame, the auditory quality of the switched frame and the auditory quality of the frame next to the switched frame can transition smoothly.
  • the decoder can also adjust these The gain adjustment is performed on the high-order part of the HOA signal of the audio frame, and the embodiment of the present application does not limit the specific implementation manner of performing gain adjustment on the high-order part of the HOA signal of these audio frames.
  • the decoding end may also perform gain adjustment on other parts of the HOA signal of these audio frames. That is, the embodiment of the present application does not limit which channel signals of the HOA signal are to be adjusted for gain.
  • the decoder can adjust the gain of any one or more channels in the HOA signal, and the one or more channels can include part or all of the high-order channels, or the remaining channels except the specified channel Some or all, or other channels.
  • Fig. 15 is a flow chart of another decoding method provided by the embodiment of the present application. Referring to Figure 15, take the encoding end coding the indication information of the initial encoding scheme into the code stream as an example, and assuming that the switching flag is not encoded in the code stream, then in the decoding process, the decoding end first parses the current frame's information from the code stream Indication of the initial decoding scheme. Then, the decoder judges whether the initial decoding scheme of the current frame is the same as that of the previous frame.
  • the decoder uses the initial decoding scheme of the current frame to decode the code stream to obtain the reconstructed HOA signal of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame, it means that the current frame is a switched frame, and the decoding end uses the switched frame decoding scheme to decode the code stream to obtain the reconstructed HOA signal of the current frame.
  • the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different
  • the audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal.
  • Figure 16 is a schematic structural diagram of an encoding device 1600 provided by an embodiment of the present application.
  • the encoding device 1600 can be implemented by software, hardware, or a combination of the two to become part or all of the encoding end device.
  • the encoding end device can be the aforementioned implementation Any encoding device in the example.
  • the apparatus 1600 includes: a first determination module 1601 and a first encoding module 1602 .
  • the first determining module 1601 is configured to determine the coding scheme of the current frame according to the high-order ambisonics HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme ;
  • the first coding scheme is an HOA coding scheme based on directional audio coding
  • the second coding scheme is an HOA coding scheme based on virtual speaker selection
  • the third coding scheme is a hybrid coding scheme
  • the first encoding module 1602 is configured to encode the signal of the specified channel in the HOA signal into the code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.
  • the signal of the designated channel includes a first-order ambisonic reverberation FOA signal
  • the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
  • the first encoding module 1602 includes:
  • the first determination submodule is used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal;
  • the encoding sub-module is used to encode the virtual loudspeaker signal and the residual signal into a code stream.
  • the first determination submodule is used for:
  • the three residual signals are determined based on the W signal, the X signal, the Y signal and the Z signal, or the X signal, the Y signal and the Z signal are determined as the three residual signals.
  • the encoding submodule is used to:
  • the obtained three-way stereo signals are respectively coded into bit streams through a stereo encoder.
  • the encoding submodule is used to:
  • the first preset monophonic signal is an all-zero signal or an all-one signal.
  • the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero.
  • the all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
  • the encoding submodule is used to:
  • the one channel of virtual loudspeaker signals and the residual signals of the three channels of residual signals are respectively coded into code streams through a mono encoder.
  • the device 1600 also includes:
  • the second encoding module is used to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
  • the third encoding module is configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
  • the first determining module 1601 includes:
  • the second determining submodule is used to determine the initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
  • the fourth determining submodule is used to determine if the initial encoding scheme of the current frame is the first encoding scheme and the initial encoding scheme of the previous frame of the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and The initial encoding scheme of the frame preceding the current frame is the first encoding scheme, and then it is determined that the encoding scheme of the current frame is the third encoding scheme.
  • the device 1600 also includes:
  • the fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
  • the device 1600 also includes:
  • the second determination module is used to determine the value of the switching flag of the current frame.
  • the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the first value;
  • the encoding scheme is the third encoding scheme, the value of the switching flag of the current frame is the second value;
  • the fifth encoding module is used to encode the value of the switching flag into the code stream.
  • the sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
  • the specified channel is consistent with the preset transmission channel in the first encoding scheme.
  • two schemes i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • Fig. 17 is a schematic structural diagram of a decoding device 1700 provided by the embodiment of the present application.
  • the decoding device 1700 can be implemented by software, hardware or a combination of the two to become part or all of the decoding end device.
  • the decoding end device can be the aforementioned implementation Any encoding device in the example.
  • the apparatus 1700 includes: a first obtaining module 1701 , a first determining module 1702 , a second determining module 1703 , a third determining module 1704 and a second obtaining module 1705 .
  • the first obtaining module 1701 is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is A high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
  • the first determination module 1702 is used to determine the signal of the specified channel in the HOA signal of the current frame based on the code stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal;
  • the second determination module 1703 is configured to determine the gain of one or more remaining channels in the HOA signal except for the specified channel based on the signal of the specified channel;
  • the second obtaining module 1705 is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  • a first determining submodule configured to determine a virtual speaker signal and a residual signal based on a code stream
  • the second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
  • the first determination submodule is used for:
  • one virtual speaker signal and three residual signals are determined.
  • the first determination submodule is used for:
  • the code stream is decoded by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
  • the X signal, the Y signal and the Z signal are determined based on the residual signal and the W signal, or the X signal, the Y signal and the Z signal are determined based on the residual signal.
  • the device 1700 also includes:
  • the first decoding module is used to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
  • the second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
  • the second decoding module includes:
  • the first obtaining submodule is used to obtain the initial HOA signal according to the code stream according to the second decoding scheme
  • the gain adjustment submodule is used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme;
  • the second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
  • the first obtaining module 1701 includes:
  • the second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or second decoding scheme;
  • the third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
  • the first obtaining module 1701 includes:
  • the third parsing sub-module is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
  • the first obtaining module 1701 includes:
  • the fourth parsing submodule is used to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
  • the fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
  • the fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and then it is determined that the decoding scheme of the current frame is the third decoding scheme.
  • two schemes i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding
  • the decoding device provided in the above embodiment decodes audio frames, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the decoding device and the decoding method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • Fig. 18 is a schematic block diagram of a codec device 1800 used in an embodiment of the present application.
  • the codec apparatus 1800 may include a processor 1801 , a memory 1802 and a bus system 1803 .
  • the processor 1801 and the memory 1802 are connected through the bus system 1803, the memory 1802 is used to store instructions, and the processor 1801 is used to execute the instructions stored in the memory 1802 to perform various encoding or decoding described in the embodiments of this application method. To avoid repetition, no detailed description is given here.
  • the processor 1801 can be a central processing unit (central processing unit, CPU), and the processor 1801 can also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 1802 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as memory 1802 .
  • Memory 1802 may include code and data 18021 accessed by processor 1801 using bus 1803 .
  • the memory 1802 may further include an operating system 18023 and an application program 18022, where the application program 18022 includes at least one program that allows the processor 1801 to execute the encoding or decoding method described in the embodiment of this application.
  • the application program 18022 may include applications 1 to N, which further include an encoding or decoding application (codec application for short) that executes the encoding or decoding method described in the embodiment of this application.
  • the bus system 1803 may include not only a data bus, but also a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 1803 in the figure.
  • the codec apparatus 1800 may also include one or more output devices, such as a display 1804 .
  • display 1804 may be a touch-sensitive display that incorporates a display with a haptic unit operable to sense touch input.
  • the display 1804 may be connected to the processor 1801 via the bus 1803 .
  • codec device 1800 may implement the encoding method in the embodiment of the present application, and may also implement the decoding method in the embodiment of the present application.
  • Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, based on a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application.
  • a computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, DVD and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • the term "processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • various illustrative logical blocks, units, and modules in the encoder 100 and the decoder 200 may be understood as corresponding circuit devices or logic elements.
  • inventions of the present application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules or units are described in the embodiments of the present application to emphasize the functional aspects of the apparatus for performing the disclosed technology, but they do not necessarily need to be realized by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: digital versatile disc (digital versatile disc, DVD)) or a semiconductor medium (for example: solid state disk (solid state disk, SSD)) wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: digital versatile disc (digital versatile disc, DVD)
  • a semiconductor medium for example: solid state disk (solid state disk, SSD)
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • All signals are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.

Abstract

An encoding and decoding method and apparatus, and a device, a storage medium and a computer program product, which belong to the technical field of audio processing. In the encoding and decoding method, an encoding and decoding scheme that is selected on the basis of a virtual loudspeaker is combined with an encoding and decoding scheme based on directional audio coding, so as to perform encoding and decoding on an HOA signal of an audio frame, that is, suitable encoding and decoding schemes are selected for different audio frames, such that the compression rate of an audio signal can be improved. In addition, in order to achieve smooth transition of auditory quality when switching between different encoding and decoding schemes is performed, for some audio frames, instead of directly using either of two encoding and decoding schemes, a new encoding and decoding scheme is used for encoding and decoding the audio frames, that is, signals of specified channels in HOA signals of the audio frames are coded into a code stream, that is, a compromise scheme is used for encoding and decoding, thereby realizing smooth transition of auditory quality after rendering and playing are performed on the HOA signals recovered by decoding.

Description

编解码方法、装置、设备、存储介质及计算机程序产品Encoding and decoding method, device, equipment, storage medium and computer program product
本申请要求于2021年9月29日提交的申请号为202111155384.0、发明名称为“编解码方法、装置、设备、存储介质及计算机程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111155384.0 and the title of the invention "encoding and decoding method, device, equipment, storage medium and computer program product" filed on September 29, 2021, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请实施例涉及音频处理技术领域,特别涉及一种编解码方法、装置、设备、存储介质及计算机程序产品。The embodiments of the present application relate to the technical field of audio processing, and in particular to a codec method, device, equipment, storage medium, and computer program product.
背景技术Background technique
高阶立体混响(higher order ambisonics,HOA)技术作为一种三维音频技术,因其在进行三维音频回放时具有更高的灵活性,因而得到了广泛的关注。为了实现更好的听觉效果,HOA技术需要大量的数据记录详细的声音场景信息。但随着HOA阶数的增加将会产生更多的数据,大量的数据造成传输和存储的困难。因此如何对HOA信号进行编解码成为目前重点关注的问题。Higher order ambisonics (HOA) technology, as a three-dimensional audio technology, has received extensive attention because of its higher flexibility in three-dimensional audio playback. In order to achieve better auditory effects, HOA technology requires a large amount of data to record detailed sound scene information. However, with the increase of the HOA order, more data will be generated, and a large amount of data will cause difficulties in transmission and storage. Therefore, how to encode and decode the HOA signal has become a major concern at present.
相关技术提出了两种对HOA信号进行编解码的方案。其中一种方案为基于方向音频编码(directional audio coding,DirAC)的编解码方案。在该方案中,编码端从当前帧的HOA信号中提取核心层信号和空间参数,将提取的核心层信号和空间参数编入码流。解码端采用与编码对称的解码方法从码流中重建出当前帧的HOA信号。另一种方案为基于虚拟扬声器选择的编解码方案。在该方案中,编码端基于匹配投影(match-projection,MP)算法从虚拟扬声器集合中选择与当前帧的HOA信号匹配的目标虚拟扬声器,基于当前帧的HOA信号和目标虚拟扬声器,确定虚拟扬声器信号,基于当前帧的HOA信号和虚拟扬声器信号确定残差信号,将虚拟扬声器信号和残差信号编入码流。解码端采用与编码对称的解码方法从码流中重建出当前帧的HOA信号。Related technologies propose two schemes for encoding and decoding HOA signals. One of the schemes is a codec scheme based on directional audio coding (directional audio coding, DirAC). In this solution, the encoder extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into the code stream. The decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream. Another solution is a codec solution based on virtual speaker selection. In this scheme, the encoder selects the target virtual speaker that matches the HOA signal of the current frame from the virtual speaker set based on the match-projection (MP) algorithm, and determines the virtual speaker based on the HOA signal of the current frame and the target virtual speaker signal, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal, and encode the virtual speaker signal and the residual signal into the code stream. The decoding end uses a decoding method symmetrical to the encoding to reconstruct the HOA signal of the current frame from the code stream.
然而,对于声场中相异性声源较少的情况,基于虚拟扬声器选择的编解码方案的压缩率较高,对于声场中相异性声源较多的情况,基于DirAC的编解码方案的压缩率较高。其中,相异性声源指声源的位置和/或方向不同的点声源。而不同音频帧的声场类型(与声场中相异性声源相关)可能不同,如果想要同时满足对不同声场类型下的音频帧均有较高的压缩率,需要根据各音频帧的声场类型为相应音频帧选择合适的编解码方案,这样就需要在不同的编解码方案之间进行切换。但基于不同的编解码方案重建出的HOA信号经过渲染回放后的听觉质量不同,在不同的编解码方案之间进行切换时,如何保证听觉质量的平滑过渡是当前需要考虑的问题。However, for the situation where there are few dissimilar sound sources in the sound field, the compression rate of the codec scheme based on virtual speaker selection is higher, and for the situation where there are many dissimilar sound sources in the sound field, the compression rate of the codec scheme based on DirAC is lower high. Wherein, the heterogeneous sound source refers to a point sound source with different positions and/or directions of the sound source. The sound field types of different audio frames (related to the heterogeneous sound sources in the sound field) may be different. If you want to have a higher compression rate for audio frames under different sound field types at the same time, you need to use the sound field type of each audio frame as Select the appropriate codec scheme for the corresponding audio frame, so you need to switch between different codec schemes. However, HOA signals reconstructed based on different codec schemes have different auditory quality after rendering and playback. When switching between different codec schemes, how to ensure the smooth transition of auditory quality is a problem that needs to be considered at present.
发明内容Contents of the invention
本申请实施例提供了一种编解码方法、装置、设备、存储介质及计算机程序产品,能够在不同的编解码方案之间进行切换时,保证听觉质量的平滑过渡。所述技术方案如下:Embodiments of the present application provide a codec method, device, device, storage medium, and computer program product, capable of ensuring a smooth transition of auditory quality when switching between different codec schemes. Described technical scheme is as follows:
第一方面,提供了一种编码方法,该方法包括:In the first aspect, an encoding method is provided, which includes:
根据当前帧的HOA信号确定当前帧的编码方案,当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种;其中,第一编码方案为基于方向音频编码的HOA编码方案(即DirAC解码方案),第二编码方案为基于虚拟扬声器选择的HOA编码方案(可以简称为基于MP的HOA解码方案),第三编码方案为混合编码方案;若当前帧的编码方案为第三编码方案,则将该HOA信号中指定通道的信号编入码流,指定通道为该HOA信号的所有通道中的部分通道。其中,混合编码方案在编码过程中既会使用第一编码方案(即DirAC编码方案)相关的技术手段,也会使用第二编码方案(基于MP的HOA编码方案)相关的技术手段,所以叫混合编码方案。Determine the coding scheme of the current frame according to the HOA signal of the current frame, the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme; wherein, the first coding scheme is based on directional audio coding The HOA encoding scheme (namely the DirAC decoding scheme), the second encoding scheme is the HOA encoding scheme based on virtual speaker selection (which can be referred to simply as the MP-based HOA decoding scheme), and the third encoding scheme is a hybrid encoding scheme; if the encoding scheme of the current frame For the third encoding scheme, the signal of the specified channel in the HOA signal is encoded into the code stream, and the specified channel is a part of all channels of the HOA signal. Among them, the hybrid coding scheme will use both the technical means related to the first coding scheme (ie DirAC coding scheme) and the technical means related to the second coding scheme (MP-based HOA coding scheme) in the coding process, so it is called hybrid encoding scheme.
在本申请实施例中,针对不同的音频帧选择合适的编解码方案,这样能提升音频信号的压缩率。同时,对于某些音频帧来说,并非直接采用第一编码方案和第二编码方案中的任一个,而是采用一种新的编解码方案来编解码这些音频帧,即将这些音频帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案进行编解码,从而使得对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。In the embodiment of the present application, an appropriate codec scheme is selected for different audio frames, which can improve the compression rate of the audio signal. At the same time, for some audio frames, instead of directly adopting any one of the first coding scheme and the second coding scheme, a new codec scheme is used to code and decode these audio frames, that is, the HOA of these audio frames The signal of the specified channel in the signal is encoded into the code stream, that is, a compromise scheme is used for encoding and decoding, so that the auditory quality after rendering and playback of the decoded and recovered HOA signal can be smoothly transitioned.
可选地,指定通道的信号包括一阶立体混响(first-order ambisonics,FOA)信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。Optionally, the signal of the specified channel includes a first-order ambisonics (first-order ambisonics, FOA) signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
可选地,将HOA信号中指定通道的信号编入码流,包括:基于W信号、X信号、Y信号和Z信号,确定虚拟扬声器信号和残差信号;将虚拟扬声器信号和残差信号编入码流。Optionally, encoding the signal of the specified channel in the HOA signal into the code stream includes: determining the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal, and the Z signal; encoding the virtual speaker signal and the residual signal input stream.
可选地,基于W信号、X信号、Y信号和Z信号,确定虚拟扬声器信号和残差信号,包括:将W信号确定为一路虚拟扬声器信号;基于W信号、X信号、Y信号和Z信号确定三路残差信号,或者,将X信号、Y信号和Z信号确定为三路残差信号。可选地,将X信号、Y信号和Z信号分别与W信号之间的差信号确定为三路残差信号。Optionally, based on the W signal, the X signal, the Y signal and the Z signal, determining the virtual speaker signal and the residual signal includes: determining the W signal as a virtual speaker signal; based on the W signal, the X signal, the Y signal and the Z signal Three paths of residual signals are determined, or the X signal, Y signal and Z signal are determined as three paths of residual signals. Optionally, the difference signals between the X signal, the Y signal, and the Z signal and the W signal are determined as three-way residual signals.
可选地,将虚拟扬声器信号和残差信号编入码流,包括:将这一路虚拟扬声器信号与第一路预设单声道信号组合,以得到一路立体声信号;将这三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号;通过立体声编码器将得到的三路立体声信号分别编入码流。Optionally, encoding the virtual speaker signal and the residual signal into the code stream includes: combining the virtual speaker signal with the first preset mono signal to obtain a stereo signal; combining the three residual signals It is combined with the second preset mono signal to obtain two stereo signals; the obtained three stereo signals are respectively encoded into the code stream through a stereo encoder.
可选地,将这三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号,包括:将这三路残差信号中相关性最高的两路残差信号组合,以得到两路立体声信号中的一路立体声信号;将这三路残差信号中除相关性最高的两路残差信号之外的一路残差信号与第二路预设单声道信号组合,以得到两路立体声信号中的另一路立体声信号。Optionally, combining the three residual signals with the second preset mono signal to obtain two stereo signals includes: combining the two residual signals with the highest correlation among the three residual signals , to obtain one stereo signal among the two stereo signals; combining one residual signal of the three residual signals except the two residual signals with the highest correlation with the second preset mono signal, In order to obtain the other stereo signal of the two stereo signals.
可选地,第一路预设单声道信号为全零信号或全一信号,全零信号包括采样点的值均为零的信号或者频点的值均为零的信号,全一信号包括采样点的值均为一的信号或者频点的值均为一的信号;第二路预设单声道信号为全零信号或全一信号;第一路预设单声道信号与第二路预设单声道信号相同或不同。Optionally, the first preset monophonic signal is an all-zero signal or an all-one signal. The all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero. The all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
可选地,将虚拟扬声器信号和残差信号编入码流,包括:通过单声道编码器将这一路虚拟扬声器信号、以及这三路残差信号中的各路残差信号分别编入码流。Optionally, encoding the virtual speaker signal and the residual signal into the code stream includes: respectively encoding the virtual speaker signal and the residual signals of the three residual signals into the code stream through a mono encoder flow.
可选地,根据当前帧的HOA信号确定当前帧的编码方案之后,还包括:若当前帧的编码方案为第一编码方案,则按照第一编码方案将该HOA信号编入码流;若当前帧的编码方案为第二编码方案,则按照第二编码方案将该HOA信号编入码流。Optionally, after determining the encoding scheme of the current frame according to the HOA signal of the current frame, it further includes: if the encoding scheme of the current frame is the first encoding scheme, encoding the HOA signal into the code stream according to the first encoding scheme; if the current If the encoding scheme of the frame is the second encoding scheme, the HOA signal is encoded into the code stream according to the second encoding scheme.
可选地,根据当前帧的高阶立体混响HOA信号确定当前帧的编码方案,包括:根据该HOA信号确定当前帧的初始编码方案,初始编码方案为第一编码方案或第二编码方案;若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则确定当前帧的编码方案为当前帧的初始编码方案;若当前帧的初始编码方案为第一编码方案且当前帧的前一帧的初始编码方案为第二编码方案,或当前帧的初始编码方案为第二编码方案且当前帧的前一帧的初始编码方案为第一编码方案,则确定当前帧的编码方案为第三编码方案。Optionally, determining the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame includes: determining the initial coding scheme of the current frame according to the HOA signal, and the initial coding scheme is the first coding scheme or the second coding scheme; If the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, then determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame; if the initial encoding scheme of the current frame is the first encoding scheme and the current frame The initial encoding scheme of the previous frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the previous frame of the current frame is the first encoding scheme, then determine the encoding scheme of the current frame is the third encoding scheme.
可选地,根据该HOA信号确定当前帧的初始编码方案之后,还包括:将当前帧的初始编码方案的指示信息编入码流。Optionally, after determining the initial encoding scheme of the current frame according to the HOA signal, the method further includes: encoding the indication information of the initial encoding scheme of the current frame into a code stream.
可选地,根据当前帧的高阶立体混响HOA信号确定当前帧的编码方案之后,还包括:确定当前帧的切换标志的值,当当前帧的编码方案为第一编码方案或第二编码方案时,当前帧的切换标志的值为第一值;当当前帧的编码方案为第三编码方案时,当前帧的切换标志的值为第二值;将切换标志的值编入码流。也即是,用切换标志来指示当前帧是否为切换帧。Optionally, after determining the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame, it also includes: determining the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value; the value of the switching flag is encoded into the code stream. That is, a switch flag is used to indicate whether the current frame is a switch frame.
可选地,根据当前帧的HOA信号确定当前帧的编码方案之后,还包括:将当前帧的编码方案的指示信息编入码流。Optionally, after determining the coding scheme of the current frame according to the HOA signal of the current frame, the method further includes: encoding the indication information of the coding scheme of the current frame into the code stream.
可选地,指定通道与第一编码方案中预设的传输通道一致。这样能够保证切换帧的听觉质量与采用第一编码方案所编码的音频帧的听觉质量相近。Optionally, the specified channel is consistent with the preset transmission channel in the first encoding scheme. In this way, it can be ensured that the auditory quality of the switching frame is similar to that of the audio frame encoded by using the first encoding scheme.
第二方面,提供了一种解码方法,该方法包括:In a second aspect, a decoding method is provided, the method comprising:
基于码流获得当前帧的解码方案,当前帧的解码方案为第一解码方案、第二解码方案和第三解码方案中的一种;其中,第一解码方案为基于方向音频解码的高阶立体混响HOA解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;若当前帧的解码方案为第三解码方案,则基于码流确定当前帧的HOA信号中指定通道的信号,指定通道为该HOA信号的所有通道中的部分通道;基于指定通道的信号,确定该HOA信号中除指定通道之外的一个或多个剩余通道的增益;基于指定通道的信号和该一个或多个剩余通道的增益,确定该一个或多个剩余通道中各个剩余通道的信号;基于指定通道的信号和该一个或多个剩余通道的信号,获得当前帧的重建HOA信号。其中,混合解码方案在解码过程中既会使用第一解码方案(即DirAC解码方案)相关的技术手段,也会使用第二解码方案(基于MP的HOA解码方案)相关的技术手段,所以叫混合解码方案。The decoding scheme of the current frame is obtained based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is high-order stereo based on directional audio decoding Reverberation HOA decoding scheme, the second decoding scheme is the HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme; if the decoding scheme of the current frame is the third decoding scheme, the HOA of the current frame is determined based on the code stream The signal of the specified channel in the signal, the specified channel is a part of all channels of the HOA signal; based on the signal of the specified channel, determine the gain of one or more remaining channels in the HOA signal except the specified channel; based on the specified channel The signal of the signal and the gain of the one or more remaining channels, determine the signal of each remaining channel in the one or more remaining channels; based on the signal of the specified channel and the signal of the one or more remaining channels, obtain the reconstructed HOA of the current frame Signal. Among them, the hybrid decoding scheme will use both the technical means related to the first decoding scheme (ie DirAC decoding scheme) and the technical means related to the second decoding scheme (MP-based HOA decoding scheme) in the decoding process, so it is called hybrid decoding scheme.
在本申请实施例中,由于编码端采用第三编码方案编码当前帧的HOA信号时,将指定通道的信号编入了码流,那么解码端从码流中解析出指定通道的信号,之后基于指定通道的信号重建出剩余通道的信号,进而重建出HOA信号。也即是,采用一种折中的方案,从而使得对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。In this embodiment of the application, since the encoding end uses the third encoding scheme to encode the HOA signal of the current frame, the signal of the specified channel is encoded into the code stream, then the decoding end parses the signal of the specified channel from the code stream, and then based on The signal of the specified channel is reconstructed to reconstruct the signal of the remaining channels, and then the HOA signal is reconstructed. That is, a compromise solution is adopted, so that the auditory quality after rendering and playing the decoded and restored HOA signal can transition smoothly.
可选地,基于码流确定当前帧的HOA信号中指定通道的信号,包括:基于码流确定虚拟扬声器信号和残差信号;基于该虚拟扬声器信号和残差信号,确定指定通道的信号。Optionally, determining the signal of the specified channel in the HOA signal of the current frame based on the code stream includes: determining a virtual speaker signal and a residual signal based on the code stream; and determining a signal of the specified channel based on the virtual speaker signal and the residual signal.
可选地,基于码流确定虚拟扬声器信号和残差信号,包括:通过立体声解码器对码流进行解码,以得到三路立体声信号;基于这三路立体声信号,确定一路虚拟扬声器信号和三路残差信号。Optionally, determining the virtual speaker signal and the residual signal based on the code stream includes: decoding the code stream through a stereo decoder to obtain three stereo signals; based on the three stereo signals, determining one virtual speaker signal and three channels residual signal.
可选地,基于这三路立体声信号,确定一路虚拟扬声器信号和三路残差信号,包括:基于这三路立体声信号中的一路立体声信号,确定一路虚拟扬声器信号;基于这三路立体声信 号中的另两路立体声信号,确定三路残差信号。Optionally, determining a virtual speaker signal and three residual signals based on the three stereo signals includes: determining a virtual speaker signal based on a stereo signal in the three stereo signals; The other two stereo signals are used to determine the three residual signals.
可选地,基于码流确定虚拟扬声器信号和残差信号,包括:通过单声道解码器对码流进行解码,以得到一路虚拟扬声器信号和三路残差信号。Optionally, determining the virtual speaker signal and the residual signal based on the code stream includes: decoding the code stream by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
可选地,指定通道的信号包括一阶立体混响FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号;基于虚拟扬声器信号和残差信号,确定指定通道的信号,包括:基于该虚拟扬声器信号,确定W信号;基于该残差信号与W信号确定X信号、Y信号和Z信号,或者,基于该残差信号确定X信号、Y信号和Z信号。Optionally, the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals; based on the virtual speaker signal and the residual signal, the specified channel is determined The signal includes: determining W signal based on the virtual speaker signal; determining X signal, Y signal and Z signal based on the residual signal and W signal, or determining X signal, Y signal and Z signal based on the residual signal.
可选地,该方法还包括:若当前帧的解码方案为第一解码方案,则按照第一解码方案,根据码流获得当前帧的重建HOA信号;若当前帧的解码方案为第二解码方案,则按照第二解码方案,根据码流获得当前帧的重建HOA信号。Optionally, the method further includes: if the decoding scheme of the current frame is the first decoding scheme, obtaining the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme; if the decoding scheme of the current frame is the second decoding scheme , then according to the second decoding scheme, the reconstructed HOA signal of the current frame is obtained according to the code stream.
可选地,按照第二解码方案,根据码流获得当前帧的重建HOA信号,包括:按照第二解码方案,根据码流获得初始HOA信号;若当前帧的前一帧的解码方案为第三解码方案,则根据当前帧的前一帧的高阶增益,对初始HOA信号的高阶部分进行增益调整;基于初始HOA信号的低阶部分和经增益调整的高阶部分,获得重建HOA信号。也即是,通过高阶增益调整,使得听觉质量进一步地平滑过渡。Optionally, according to the second decoding scheme, obtaining the reconstructed HOA signal of the current frame according to the code stream includes: according to the second decoding scheme, obtaining the initial HOA signal according to the code stream; if the decoding scheme of the previous frame of the current frame is the third In the decoding scheme, gain adjustment is performed on the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame; based on the low-order part of the initial HOA signal and the gain-adjusted high-order part, a reconstructed HOA signal is obtained. That is, through high-order gain adjustments, the auditory quality is further smoothed.
可选地,基于码流获得当前帧的解码方案,包括:从码流中解析出当前帧的切换标志的值;若切换标志的值为第一值,则从码流中解析当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案;若切换标志的值为第二值,确定当前帧的解码方案为第三解码方案。Optionally, obtaining the decoding scheme of the current frame based on the code stream includes: parsing the value of the switching flag of the current frame from the code stream; if the value of the switching flag is the first value, parsing the decoding scheme of the current frame from the code stream Scheme indication information, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme; if the value of the switching flag is the second value, it is determined that the decoding scheme of the current frame is the third decoding scheme.
可选地,基于码流获得当前帧的解码方案,包括:从码流中解析出当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案。Optionally, obtaining the decoding scheme of the current frame based on the code stream includes: parsing the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme, and the second decoding scheme. scheme or a third decoding scheme.
可选地,基于码流获得当前帧的解码方案,包括:从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案;若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案;若当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案,则确定当前帧的解码方案为第三解码方案。Optionally, obtaining the decoding scheme of the current frame based on the code stream includes: parsing the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme; if the initial decoding scheme of the current frame The same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame; if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame If the scheme is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then the decoding scheme of the current frame is determined to be the third decoding scheme.
第三方面,提供了一种编码装置,所述编码装置具有实现上述第一方面中编码方法行为的功能。所述编码装置包括一个或多个模块,该一个或多个模块用于实现上述第一方面所提供的编码方法。In a third aspect, an encoding device is provided, and the encoding device has a function of implementing the behavior of the encoding method in the first aspect above. The encoding device includes one or more modules, and the one or more modules are used to implement the encoding method provided in the first aspect above.
也即是,提供了一种编码装置,该装置包括:That is, an encoding device is provided, the device comprising:
第一确定模块,用于根据当前帧的高阶立体混响HOA信号确定当前帧的编码方案,当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种;其中,第一编码方案为基于方向音频编码的HOA编码方案,第二编码方案为基于虚拟扬声器选择的HOA编码方案,第三编码方案为混合编码方案;The first determination module is used to determine the coding scheme of the current frame according to the high-order ambisonics HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme; Wherein, the first coding scheme is an HOA coding scheme based on directional audio coding, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme;
第一编码模块,用于若当前帧的编码方案为第三编码方案,则将HOA信号中指定通道的信号编入码流,指定通道为HOA信号的所有通道中的部分通道。The first encoding module is configured to encode the signal of the specified channel in the HOA signal into the code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.
可选地,指定通道的信号包括一阶立体混响FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。Optionally, the signal of the designated channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
可选地,第一编码模块包括:Optionally, the first coding module includes:
第一确定子模块,用于基于W信号、X信号、Y信号和Z信号,确定虚拟扬声器信号和残差信号;The first determination submodule is used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal;
编码子模块,用于将该虚拟扬声器信号和残差信号编入码流。The encoding sub-module is used to encode the virtual loudspeaker signal and the residual signal into a code stream.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
将W信号确定为一路虚拟扬声器信号;Determining the W signal as a virtual loudspeaker signal;
基于W信号、X信号、Y信号和Z信号确定三路残差信号,或者,将X信号、Y信号和Z信号确定为三路残差信号。The three residual signals are determined based on the W signal, the X signal, the Y signal and the Z signal, or the X signal, the Y signal and the Z signal are determined as the three residual signals.
可选地,编码子模块用于:Optionally, the encoding submodule is used to:
将这一路虚拟扬声器信号与第一路预设单声道信号组合,以得到一路立体声信号;Combine this virtual speaker signal with the first preset mono signal to get a stereo signal;
将这三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号;Combining the three residual signals with the second preset mono signal to obtain two stereo signals;
通过立体声编码器将得到的三路立体声信号分别编入码流。The obtained three-way stereo signals are respectively coded into bit streams through a stereo encoder.
可选地,编码子模块用于:Optionally, the encoding submodule is used to:
将这三路残差信号中相关性最高的两路残差信号组合,以得到两路立体声信号中的一路立体声信号;combining the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals;
将这三路残差信号中除相关性最高的两路残差信号之外的一路残差信号与第二路预设单声道信号组合,以得到两路立体声信号中的另一路立体声信号。Combining one residual signal except for the two residual signals with the highest correlation among the three residual signals and the second preset mono signal to obtain the other stereo signal among the two stereo signals.
可选地,第一路预设单声道信号为全零信号或全一信号,全零信号包括采样点的值均为零的信号或者频点的值均为零的信号,全一信号包括采样点的值均为一的信号或者频点的值均为一的信号;第二路预设单声道信号为全零信号或全一信号;第一路预设单声道信号与第二路预设单声道信号相同或不同。Optionally, the first preset monophonic signal is an all-zero signal or an all-one signal. The all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero. The all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
可选地,编码子模块用于:Optionally, the encoding submodule is used to:
通过单声道编码器将这一路虚拟扬声器信号、以及这三路残差信号中的各路残差信号分别编入码流。The one channel of virtual loudspeaker signals and the residual signals of the three channels of residual signals are respectively coded into code streams through a mono encoder.
可选地,该装置还包括:Optionally, the device also includes:
第二编码模块,用于若当前帧的编码方案为第一编码方案,则按照第一编码方案将该HOA信号编入码流;The second encoding module is used to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
第三编码模块,用于若当前帧的编码方案为第二编码方案,则按照第二编码方案将该HOA信号编入码流。The third encoding module is configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
可选地,第一确定模块包括:Optionally, the first determination module includes:
第二确定子模块,用于根据该HOA信号确定当前帧的初始编码方案,初始编码方案为第一编码方案或第二编码方案;The second determining submodule is used to determine the initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
第三确定子模块,用于若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则确定当前帧的编码方案为当前帧的初始编码方案;The third determining submodule is used to determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame;
第四确定子模块,用于若当前帧的初始编码方案为第一编码方案且当前帧的前一帧的初始编码方案为第二编码方案,或当前帧的初始编码方案为第二编码方案且当前帧的前一帧的初始编码方案为第一编码方案,则确定当前帧的编码方案为第三编码方案。The fourth determining submodule is used to determine if the initial encoding scheme of the current frame is the first encoding scheme and the initial encoding scheme of the previous frame of the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and The initial encoding scheme of the frame preceding the current frame is the first encoding scheme, and then it is determined that the encoding scheme of the current frame is the third encoding scheme.
可选地,该装置还包括:Optionally, the device also includes:
第四编码模块,用于将当前帧的初始编码方案的指示信息编入码流。The fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
可选地,该装置还包括:Optionally, the device also includes:
第二确定模块,用于确定当前帧的切换标志的值,当当前帧的编码方案为第一编码方案或第二编码方案时,当前帧的切换标志的值为第一值;当当前帧的编码方案为第三编码方案时,当前帧的切换标志的值为第二值;The second determination module is used to determine the value of the switching flag of the current frame. When the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the first value; When the encoding scheme is the third encoding scheme, the value of the switching flag of the current frame is the second value;
第五编码模块,用于将该切换标志的值编入码流。The fifth encoding module is used to encode the value of the switching flag into the code stream.
可选地,该装置还包括:Optionally, the device also includes:
第六编码模块,用于将当前帧的编码方案的指示信息编入码流。The sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
可选地,指定通道与第一编码方案中预设的传输通道一致。Optionally, the specified channel is consistent with the preset transmission channel in the first encoding scheme.
第四方面,提供了一种解码装置,所述解码装置具有实现上述第二方面中解码方法行为的功能。所述解码装置包括一个或多个模块,该一个或多个模块用于实现上述第二方面所提供的解码方法。In a fourth aspect, a decoding device is provided, and the decoding device has the function of realizing the behavior of the decoding method in the second aspect above. The decoding device includes one or more modules, and the one or more modules are used to implement the decoding method provided by the second aspect above.
也即是,提供了一种解码装置,该装置包括:That is, a decoding device is provided, which includes:
第一获得模块,用于基于码流获得当前帧的解码方案,当前帧的解码方案为第一解码方案、第二解码方案和第三解码方案中的一种;其中,第一解码方案为基于方向音频解码的高阶立体混响HOA解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;The first obtaining module is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is based on High-order ambisonic reverberation HOA decoding scheme for directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
第一确定模块,用于若当前帧的解码方案为第三解码方案,则基于码流确定当前帧的HOA信号中指定通道的信号,指定通道为HOA信号的所有通道中的部分通道;The first determination module is used to determine the signal of the specified channel in the HOA signal of the current frame based on the code stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal;
第二确定模块,用于基于指定通道的信号,确定HOA信号中除指定通道之外的一个或多个剩余通道的增益;The second determination module is used to determine the gain of one or more remaining channels in the HOA signal except the specified channel based on the signal of the specified channel;
第三确定模块,用于基于指定通道的信号和该一个或多个剩余通道的增益,确定该一个或多个剩余通道中各个剩余通道的信号;A third determination module, configured to determine the signal of each of the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels;
第二获得模块,用于基于指定通道的信号和该一个或多个剩余通道的信号,获得当前帧的重建HOA信号。The second obtaining module is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
可选地,第一确定模块包括:Optionally, the first determination module includes:
第一确定子模块,用于基于码流确定虚拟扬声器信号和残差信号;A first determining submodule, configured to determine a virtual speaker signal and a residual signal based on a code stream;
第二确定子模块,用于基于该虚拟扬声器信号和残差信号,确定指定通道的信号。The second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
通过立体声解码器对码流进行解码,以得到三路立体声信号;Decode the code stream through a stereo decoder to obtain three stereo signals;
基于这三路立体声信号,确定一路虚拟扬声器信号和三路残差信号。Based on the three stereo signals, one virtual speaker signal and three residual signals are determined.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
基于这三路立体声信号中的一路立体声信号,确定一路虚拟扬声器信号;Determining a virtual loudspeaker signal based on a stereo signal of the three stereo signals;
基于这三路立体声信号中的另两路立体声信号,确定三路残差信号。Based on the other two stereo signals of the three stereo signals, three residual signals are determined.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
通过单声道解码器对码流进行解码,以得到一路虚拟扬声器信号和三路残差信号。The code stream is decoded by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
可选地,指定通道的信号包括一阶立体混响FOA信号,FOA信号包括全向的W信号, 以及定向的X信号、Y信号和Z信号;Optionally, the signal of the designated channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals;
第一确定子模块用于:The first determined submodule is used for:
基于该虚拟扬声器信号,确定W信号;determining a W signal based on the virtual loudspeaker signal;
基于该残差信号与W信号确定X信号、Y信号和Z信号,或者,基于该残差信号确定X信号、Y信号和Z信号。The X signal, the Y signal and the Z signal are determined based on the residual signal and the W signal, or the X signal, the Y signal and the Z signal are determined based on the residual signal.
可选地,该装置还包括:Optionally, the device also includes:
第一解码模块,用于若当前帧的解码方案为第一解码方案,则按照第一解码方案,根据码流获得当前帧的重建HOA信号;The first decoding module is used to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
第二解码模块,用于若当前帧的解码方案为第二解码方案,则按照第二解码方案,根据码流获得当前帧的重建HOA信号。The second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
可选地,第二解码模块包括:Optionally, the second decoding module includes:
第一获得子模块,用于按照第二解码方案,根据码流获得初始HOA信号;The first obtaining submodule is used to obtain the initial HOA signal according to the code stream according to the second decoding scheme;
增益调整子模块,用于若当前帧的前一帧的解码方案为第三解码方案,则根据当前帧的前一帧的高阶增益,对初始HOA信号的高阶部分进行增益调整;The gain adjustment submodule is used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme;
第二获得子模块,用于基于初始HOA信号的低阶部分和经增益调整的高阶部分,获得重建HOA信号。The second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
可选地,第一获得模块包括:Optionally, the first obtaining module includes:
第一解析子模块,用于从码流中解析出当前帧的切换标志的值;The first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
第二解析子模块,用于若该切换标志的值为第一值,则从码流中解析当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案;The second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or second decoding scheme;
第三确定子模块,用于若该切换标志的值为第二值,确定当前帧的解码方案为第三解码方案。The third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
可选地,第一获得模块包括:Optionally, the first obtaining module includes:
第三解析子模块,用于从码流中解析出当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案。The third parsing sub-module is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
可选地,第一获得模块包括:Optionally, the first obtaining module includes:
第四解析子模块,用于从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案;The fourth parsing submodule is used to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
第四确定子模块,用于若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案;The fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
第五确定子模块,用于若当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案,则确定当前帧的解码方案为第三解码方案。The fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and then it is determined that the decoding scheme of the current frame is the third decoding scheme.
第五方面,提供了一种编码端设备,所述编码端设备包括处理器和存储器,所述存储器用于存储执行上述第一方面所提供的编码方法的程序,以及存储用于实现上述第一方面所提供的编码方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。According to the fifth aspect, there is provided an encoding end device, the encoding end device includes a processor and a memory, and the memory is used to store a program for executing the encoding method provided in the above first aspect, and to store a program for realizing the above first aspect. The data involved in the encoding method provided by the aspect. The processor is configured to execute programs stored in the memory. The operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
第六方面,提供了一种解码端设备,所述解码端设备包括处理器和存储器,所述存储器用于存储执行上述第二方面所提供的解码方法的程序,以及存储用于实现上述第二方面所提供的解码方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。According to the sixth aspect, there is provided a decoding end device, the decoding end device includes a processor and a memory, and the memory is used to store a program for executing the decoding method provided in the above second aspect, and to store a program for implementing the above second The data involved in the decode method provided by the aspect. The processor is configured to execute programs stored in the memory. The operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
第七方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行上述第一方面所述的编码方法或第二方面所述的解码方法。In the seventh aspect, a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, the computer executes the encoding method or the second encoding method described in the first aspect above. The decoding method described in the aspect.
第八方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的编码方法或第二方面所述的解码方法。The eighth aspect provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the encoding method described in the first aspect or the decoding method described in the second aspect.
上述第三方面、第四方面、第五方面、第六方面、第七方面和第八方面所获得的技术效果与第一方面或第二方面中对应的技术手段获得的技术效果近似,在这里不再赘述。The technical effects obtained by the above third, fourth, fifth, sixth, seventh and eighth aspects are similar to the technical effects obtained by the corresponding technical means in the first or second aspect, here No longer.
本申请实施例提供的技术方案至少能够带来以下有益效果:The technical solutions provided by the embodiments of the present application can at least bring the following beneficial effects:
在本申请实施例中,结合两个方案(即基于虚拟扬声器选择的编解码方案和基于方向音频编码的编解码方案)对音频帧的HOA信号进行编解码,也即针对不同的音频帧选择合适的编解码方案,这样能够提升音频信号的压缩率。同时,为了使得在不同编解码方案之间切换时听觉质量的平滑过渡,本方案中对于某些音频帧来说,并非直接采用上述两个方案中的任一个方案进行编解码,而是采用一种新的编解码方案来编解码这些音频帧,即将这些音频帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案进行编解码,从而使得对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。In the embodiment of the present application, two schemes (i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, to select the appropriate The codec scheme, which can improve the compression rate of the audio signal. At the same time, in order to achieve a smooth transition of auditory quality when switching between different codec schemes, for some audio frames in this scheme, either one of the above two schemes is not directly used for encoding and decoding, but one A new codec scheme is used to encode and decode these audio frames, that is, the signals of the specified channels in the HOA signals of these audio frames are encoded into the code stream, that is, a compromise scheme is used for encoding and decoding, so that the HOA recovered by decoding After the signal is rendered and played, the auditory quality can be smoothly transitioned.
附图说明Description of drawings
图1是本申请实施例提供的一种实施环境的示意图;FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
图2是本申请实施例提供的一种终端场景的实施环境的示意图;FIG. 2 is a schematic diagram of an implementation environment of a terminal scenario provided by an embodiment of the present application;
图3是本申请实施例提供的一种无线或核心网设备的转码场景的实施环境的示意图;FIG. 3 is a schematic diagram of an implementation environment of a transcoding scenario of a wireless or core network device provided in an embodiment of the present application;
图4是本申请实施例提供的一种广播电视场景的实施环境的示意图;FIG. 4 is a schematic diagram of an implementation environment of a broadcast television scene provided by an embodiment of the present application;
图5是本申请实施例提供的一种虚拟现实流场景的实施环境的示意图;FIG. 5 is a schematic diagram of an implementation environment of a virtual reality streaming scene provided by an embodiment of the present application;
图6是本申请实施例提供的一种编码方法的流程图;FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application;
图7是本申请实施例提供的一种切换帧编码方案的示意图;FIG. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application;
图8是本申请实施例提供的一种基于虚拟扬声器选择的HOA编码方案的示意图;FIG. 8 is a schematic diagram of an HOA coding scheme based on virtual speaker selection provided by an embodiment of the present application;
图9是本申请实施例提供的一种基于DirAC的HOA编码方案的示意图;FIG. 9 is a schematic diagram of a DirAC-based HOA coding scheme provided by an embodiment of the present application;
图10是本申请实施例提供的另一种编码方法的流程图;FIG. 10 is a flow chart of another encoding method provided by the embodiment of the present application;
图11是本申请实施例提供的一种解码方法的流程图;FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application;
图12是本申请实施例提供的一种切换帧解码方案的示意图;FIG. 12 is a schematic diagram of a switching frame decoding scheme provided by an embodiment of the present application;
图13是本申请实施例提供的一种基于虚拟扬声器选择的HOA解码方案的示意图;FIG. 13 is a schematic diagram of an HOA decoding scheme based on virtual speaker selection provided by an embodiment of the present application;
图14是本申请实施例提供的一种基于DirAC的HOA解码方案的示意图;FIG. 14 is a schematic diagram of a DirAC-based HOA decoding scheme provided by an embodiment of the present application;
图15是本申请实施例提供的另一种解码方法的流程图;Fig. 15 is a flow chart of another decoding method provided by the embodiment of the present application;
图16是本申请实施例提供的一种编码装置的结构示意图;FIG. 16 is a schematic structural diagram of an encoding device provided by an embodiment of the present application;
图17是本申请实施例提供的一种解码装置的结构示意图;FIG. 17 is a schematic structural diagram of a decoding device provided by an embodiment of the present application;
图18是本申请实施例提供的一种编解码装置的示意性框图。Fig. 18 is a schematic block diagram of a codec device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the following will further describe the embodiments of the present application in detail in conjunction with the accompanying drawings.
在对本申请实施例提供的编解码方法进行详细地解释说明之前,先对本申请实施例涉及的实施环境进行介绍。Before explaining the encoding and decoding method provided by the embodiment of the present application in detail, the implementation environment involved in the embodiment of the present application is firstly introduced.
请参考图1,图1是本申请实施例提供的一种实施环境的示意图。该实施环境包括源装置10、目的地装置20、链路30和存储装置40。其中,源装置10可以产生经编码的媒体数据。因此,源装置10也可以被称为媒体数据编码装置。目的地装置20可以对由源装置10所产生的经编码的媒体数据进行解码。因此,目的地装置20也可以被称为媒体数据解码装置。链路30可以接收源装置10所产生的经编码的媒体数据,并可以将该经编码的媒体数据传输给目的地装置20。存储装置40可以接收源装置10所产生的经编码的媒体数据,并可以将该经编码的媒体数据进行存储,这样的条件下,目的地装置20可以直接从存储装置40中获取经编码的媒体数据。或者,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码的媒体数据的另一中间存储装置,这样的条件下,目的地装置20可以经由流式传输或下载存储装置40存储的经编码的媒体数据。Please refer to FIG. 1 , which is a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment includes source device 10 , destination device 20 , link 30 and storage device 40 . Among them, the source device 10 may generate encoded media data. Therefore, the source device 10 may also be called a media data encoding device. Destination device 20 may decode the encoded media data generated by source device 10 . Accordingly, destination device 20 may also be referred to as a media data decoding device. Link 30 may receive encoded media data generated by source device 10 and may transmit the encoded media data to destination device 20 . The storage device 40 can receive the encoded media data generated by the source device 10, and can store the encoded media data. Under such conditions, the destination device 20 can directly obtain the encoded media from the storage device 40. data. Alternatively, the storage device 40 may correspond to a file server or another intermediate storage device that may save encoded media data generated by the source device 10, in which case the destination device 20 may transmit or download the media data from the storage device 40 via streaming or downloading. Stored encoded media data.
源装置10和目的地装置20均可以包括一个或多个处理器以及耦合到该一个或多个处理器的存储器,该存储器可以包括随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、带电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、快闪存储器、可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体等。例如,源装置10和目的地装置20均可以包括桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。Both the source device 10 and the destination device 20 may include one or more processors and a memory coupled to the one or more processors, and the memory may include random access memory (random access memory, RAM), read-only memory ( read-only memory, ROM), charged erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), flash memory, can be used to store the desired program in the form of instructions or data structures that can be accessed by the computer Any other media etc. of the code. For example, both source device 10 and destination device 20 may include desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, Televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
链路30可以包括能够将经编码的媒体数据从源装置10传输到目的地装置20的一个或多个媒体或装置。在一种可能的实现方式中,链路30可以包括能够使源装置10实时地将经编码的媒体数据直接发送到目的地装置20的一个或多个通信媒体。在本申请实施例中,源装置10可以基于通信标准来调制经编码的媒体数据,该通信标准可以为无线通信协议等,并且可以将经调制的媒体数据发送给目的地装置20。该一个或多个通信媒体可以包括无线和/或有线通信媒体,例如该一个或多个通信媒体可以包括射频(radio frequency,RF)频谱或一个或多个物理传输线。该一个或多个通信媒体可以形成基于分组的网络的一部分,基于分组的网络可以为局域网、广域网或全球网络(例如,因特网)等。该一个或多个通信媒体可以包括路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备等,本申请实施例对此不做具体限定。Link 30 may include one or more media or devices capable of transmitting encoded media data from source device 10 to destination device 20 . In one possible implementation, link 30 may include one or more communication media that enable source device 10 to transmit encoded media data directly to destination device 20 in real-time. In the embodiment of the present application, the source device 10 may modulate the encoded media data based on a communication standard, such as a wireless communication protocol, etc., and may send the modulated media data to the destination device 20 . The one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include radio frequency (radio frequency, RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet), among others. The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., which are not specifically limited in this embodiment of the present application.
在一种可能的实现方式中,存储装置40可以将接收到的由源装置10发送的经编码的媒体数据进行存储,目的地装置20可以直接从存储装置40中获取经编码的媒体数据。这样的 条件下,存储装置40可以包括多种分布式或本地存取的数据存储媒体中的任一者,例如,该多种分布式或本地存取的数据存储媒体中的任一者可以为硬盘驱动器、蓝光光盘、数字多功能光盘(digital versatile disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码媒体数据的任何其它合适的数字存储媒体等。In a possible implementation manner, the storage device 40 may store the received encoded media data sent by the source device 10 , and the destination device 20 may directly acquire the encoded media data from the storage device 40 . Under such conditions, the storage device 40 may include any one of a variety of distributed or locally accessed data storage media, for example, any one of the various distributed or locally accessed data storage media may be Hard disk drive, Blu-ray Disc, digital versatile disc (DVD), compact disc read-only memory (CD-ROM), flash memory, volatile or nonvolatile memory, or Any other suitable digital storage medium for storing encoded media data, etc.
在一种可能的实现方式中,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码媒体数据的另一中间存储装置,目的地装置20可经由流式传输或下载存储装置40存储的媒体数据。文件服务器可以为能够存储经编码的媒体数据并且将经编码的媒体数据发送给目的地装置20的任意类型的服务器。在一种可能的实现方式中,文件服务器可以包括网络服务器、文件传输协议(file transfer protocol,FTP)服务器、网络附属存储(network attached storage,NAS)装置或本地磁盘驱动器等。目的地装置20可以通过任意标准数据连接(包括因特网连接)来获取经编码媒体数据。任意标准数据连接可以包括无线信道(例如,Wi-Fi连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于获取存储在文件服务器上的经编码的媒体数据的两者的组合。经编码的媒体数据从存储装置40的传输可为流式传输、下载传输或两者的组合。In one possible implementation, the storage device 40 may correspond to a file server or another intermediate storage device that may save the encoded media data generated by the source device 10, and the destination device 20 may transmit or download the storage device via streaming or downloading. 40 stored media data. The file server may be any type of server capable of storing encoded media data and sending the encoded media data to destination device 20 . In a possible implementation manner, the file server may include a network server, a file transfer protocol (file transfer protocol, FTP) server, a network attached storage (network attached storage, NAS) device, or a local disk drive. Destination device 20 may obtain encoded media data over any standard data connection, including an Internet connection. Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), cable modem, etc.), or is suitable for obtaining encoded data stored on a file server. A combination of the two for the media data. The transmission of encoded media data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.
图1所示的实施环境仅为一种可能的实现方式,并且本申请实施例的技术不仅可以适用于图1所示的可以对媒体数据进行编码的源装置10,以及可以对经编码的媒体数据进行解码的目的地装置20,还可以适用于其他可以对媒体数据进行编码和对经编码的媒体数据进行解码的装置,本申请实施例对此不做具体限定。The implementation environment shown in Figure 1 is only a possible implementation, and the technology of the embodiment of the present application is not only applicable to the source device 10 shown in Figure 1 that can encode media data, but also can encode the encoded media The destination device 20 for decoding data may also be applicable to other devices capable of encoding media data and decoding encoded media data, which is not specifically limited in this embodiment of the present application.
在图1所示的实施环境中,源装置10包括数据源120、编码器100和输出接口140。在一些实施例中,输出接口140可以包括调节器/解调器(调制解调器)和/或发送器,其中发送器也可以称为发射器。数据源120可以包括图像捕获装置(例如,摄像机等)、含有先前捕获的媒体数据的存档、用于从媒体数据内容提供者接收媒体数据的馈入接口,和/或用于产生媒体数据的计算机图形***,或媒体数据的这些来源的组合。In the implementation environment shown in FIG. 1 , the source device 10 includes a data source 120 , an encoder 100 and an output interface 140 . In some embodiments, output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter, where a transmitter may also be referred to as a transmitter. Data source 120 may include an image capture device (e.g., video camera, etc.), an archive containing previously captured media data, a feed interface for receiving media data from a media data content provider, and/or a computer for generating media data graphics system, or a combination of these sources of media data.
数据源120可以向编码器100发送媒体数据,编码器100可以对接收到由数据源120发送的媒体数据进行编码,得到经编码的媒体数据。编码器可以将经编码的媒体数据发送给输出接口。在一些实施例中,源装置10经由输出接口140将经编码的媒体数据直接发送到目的地装置20。在其它实施例中,经编码的媒体数据还可存储到存储装置40上,供目的地装置20以后获取并用于解码和/或显示。The data source 120 may send media data to the encoder 100, and the encoder 100 may encode the received media data sent by the data source 120 to obtain encoded media data. An encoder may send encoded media data to an output interface. In some embodiments, source device 10 sends the encoded media data directly to destination device 20 via output interface 140 . In other embodiments, encoded media data may also be stored on storage device 40 for later retrieval by destination device 20 for decoding and/or display.
在图1所示的实施环境中,目的地装置20包括输入接口240、解码器200和显示装置220。在一些实施例中,输入接口240包括接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码的媒体数据,然后再发送给解码器200,解码器200可以对接收到的经编码的媒体数据进行解码,得到经解码的媒体数据。解码器可以将经解码的媒体数据发送给显示装置220。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码的媒体数据。显示装置220可以为多种类型中的任一种类型的显示装置,例如,显示装置220可以为液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。In the implementation environment shown in FIG. 1 , the destination device 20 includes an input interface 240 , a decoder 200 and a display device 220 . In some embodiments, input interface 240 includes a receiver and/or a modem. The input interface 240 can receive the encoded media data via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 can decode the received encoded media data to obtain the decoded media data. media data. The decoder may transmit the decoded media data to the display device 220 . The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . In general, the display device 220 displays the decoded media data. The display device 220 can be any type of display device in various types, for example, the display device 220 can be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) monitor or other type of display device.
尽管图1中未示出,但在一些方面,编码器100和解码器200可各自与编码器和解码器 集成,且可以包括适当的多路复用器-多路分用器(multiplexer-demultiplexer,MUX-DEMUX)单元或其它硬件和软件,用于共同数据流或单独数据流中的音频和视频两者的编码。在一些实施例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。Although not shown in FIG. 1 , in some aspects encoder 100 and decoder 200 may be individually integrated with the encoder and decoder, and may include appropriate multiplexer-demultiplexer (multiplexer-demultiplexer) , MUX-DEMUX) unit or other hardware and software for encoding both audio and video in a common data stream or in separate data streams. In some embodiments, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP), if applicable.
编码器100和解码器200各自可为以下各项电路中的任一者:一个或多个微处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请实施例的技术,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本申请实施例的技术。前述内容(包括硬件、软件、硬件与软件的组合等)中的任一者可被视为一个或多个处理器。编码器100和解码器200中的每一者都可以包括在一个或多个编码器或解码器中,所述编码器或所述解码器中的任一者可以集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。Each of the encoder 100 and the decoder 200 can be any one of the following circuits: one or more microprocessors, digital signal processing (digital signal processing, DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques of the embodiments of the present application are implemented partially in software, the device may store instructions for the software in a suitable non-transitory computer-readable storage medium, and may use one or more processors in hardware The instructions are executed to implement the technology of the embodiments of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of encoder 100 and decoder 200 may be included in one or more encoders or decoders, either of which may be integrated into a combined encoding in a corresponding device Part of a codec/decoder (codec).
本申请实施例可大体上将编码器100称为将某些信息“发信号通知”或“发送”到例如解码器200的另一装置。术语“发信号通知”或“发送”可大体上指代用于对经压缩的媒体数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。Embodiments of the present application may generally refer to the encoder 100 as “signaling” or “sending” certain information to another device such as the decoder 200 . The term "signaling" or "sending" may generally refer to the transmission of syntax elements and/or other data for decoding compressed media data. This transfer can occur in real time or near real time. Alternatively, this communication may occur after a period of time, such as upon encoding when storing syntax elements in an encoded bitstream to a computer-readable storage medium, which the decoding device may then perform after the syntax elements are stored on this medium The syntax element is retrieved at any time.
本申请实施例提供的编解码方法可以应用于多种场景,接下来以待编码的媒体数据为HOA信号为例,对其中的几种场景分别进行介绍。The encoding and decoding methods provided in the embodiments of the present application can be applied to various scenarios. Next, several scenarios will be introduced by taking the media data to be encoded as an HOA signal as an example.
请参考图2,图2是本申请实施例提供的一种编解码方法应用于终端场景的实施环境的示意图。该实施环境包括第一终端101和第二终端201,第一终端101与第二终端201进行通信连接。该通信连接可以为无线连接,也可以为有线连接,本申请实施例对此不做限定。Please refer to FIG. 2 . FIG. 2 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a terminal scenario. The implementation environment includes a first terminal 101 and a second terminal 201 , and the first terminal 101 and the second terminal 201 are connected in communication. The communication connection may be a wireless connection or a wired connection, which is not limited in this embodiment of the present application.
其中,第一终端101可以为发送端设备,也可以为接收端设备,同理,第二终端201可以为接收端设备,也可以为发送端设备。例如,在第一终端101为发送端设备的情况下,第二终端201为接收端设备,在第一终端101为接收端设备的情况下,第二终端201为发送端设备。Wherein, the first terminal 101 may be a sending end device or a receiving end device. Similarly, the second terminal 201 may be a receiving end device or a sending end device. For example, when the first terminal 101 is a sending end device, the second terminal 201 is a receiving end device, and when the first terminal 101 is a receiving end device, the second terminal 201 is a sending end device.
接下来以第一终端101为发送端设备,第二终端201为接收端设备为例进行介绍。Next, an introduction will be made by taking the first terminal 101 as a sending end device and the second terminal 201 as a receiving end device as an example.
第一终端101和第二终端201均包括音频采集模块、音频回放模块、编码器、解码器、信道编码模块和信道解码模块。在本申请实施例中,该编码器为一种三维音频编码器,该解码器为一种三维音频解码器。Both the first terminal 101 and the second terminal 201 include an audio collection module, an audio playback module, an encoder, a decoder, a channel encoding module and a channel decoding module. In the embodiment of the present application, the encoder is a three-dimensional audio encoder, and the decoder is a three-dimensional audio decoder.
第一终端101中的音频采集模块采集HOA信号并传输给编码器,编码器利用本申请实施例提供的编码方法对HOA信号进行编码,该编码可以称为信源编码。之后,为了实现HOA信号在信道中的传输,信道编码模块还需要再进行信道编码,然后将编码得到的码流通过无线或者有线网络通信设备在数字信道中传输。The audio collection module in the first terminal 101 collects the HOA signal and transmits it to the encoder. The encoder encodes the HOA signal using the encoding method provided in the embodiment of the present application. The encoding may be called source encoding. Later, in order to realize the transmission of the HOA signal in the channel, the channel coding module needs to perform channel coding again, and then transmit the encoded code stream in the digital channel through the wireless or wired network communication equipment.
第二终端201通过无线或者有线网络通信设备接收数字信道中传输的码流,信道解码模块对码流进行信道解码,然后解码器利用本申请实施例提供的解码方法解码得到HOA信号, 再通过音频回放模块进行播放。The second terminal 201 receives the code stream transmitted in the digital channel through a wireless or wired network communication device, the channel decoding module performs channel decoding on the code stream, and then the decoder decodes the HOA signal by using the decoding method provided in the embodiment of this application, and then passes the audio Playback module to play.
其中,第一终端101和第二终端201可以是任何一种可与用户通过键盘、触摸板、触摸屏、遥控器、语音交互或手写设备等一种或多种方式进行人机交互的电子产品,例如个人计算机(personal computer,PC)、手机、智能手机、个人数字助手(personal digital assistant,PDA)、可穿戴设备、掌上电脑PPC(pocket PC)、平板电脑、智能车机、智能电视、智能音箱等。Among them, the first terminal 101 and the second terminal 201 can be any electronic product that can interact with the user through one or more ways such as keyboard, touch pad, touch screen, remote control, voice interaction or handwriting equipment, etc., Such as personal computer (personal computer, PC), mobile phone, smart phone, personal digital assistant (personal digital assistant, PDA), wearable device, PPC (pocket PC), tablet computer, smart car machine, smart TV, smart speaker wait.
本领域技术人员应能理解上述终端仅为举例,其他现有的或今后可能出现的终端如可适用于本申请实施例,也应包含在本申请实施例保护范围以内,并在此以引用方式包含于此。Those skilled in the art should understand that the above-mentioned terminals are only examples, and other existing or future terminals that are applicable to this embodiment of the application should also be included in the scope of protection of this embodiment of the application, and are hereby referenced included here.
请参考图3,图3是本申请实施例提供的一种编解码方法应用于无线或核心网设备的转码场景的实施环境的示意图。该实施环境包括信道解码模块、音频解码器、音频编码器和信道编码模块。在本申请实施例中,该音频编码器为一种三维音频编码器,该音频解码器为一种三维音频解码器。Please refer to FIG. 3 . FIG. 3 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a transcoding scenario of a wireless or core network device. The implementation environment includes a channel decoding module, an audio decoder, an audio encoder and a channel encoding module. In the embodiment of the present application, the audio encoder is a three-dimensional audio encoder, and the audio decoder is a three-dimensional audio decoder.
其中,音频解码器可以为利用本申请实施例提供的解码方法的解码器,也可以为利用其他解码方法的解码器。音频编码器可以为利用本申请实施例提供的编码方法的编码器,也可以为利用其他编码方法的编码器。在音频解码器为利用本申请实施例提供的解码方法的解码器的情况下,音频编码器为利用其他编码方法的编码器,在音频解码器为利用其他解码方法的解码器的情况下,音频编码器为利用本申请实施例提供的编码方法的编码器。Wherein, the audio decoder may be a decoder using the decoding method provided in the embodiment of the present application, or may be a decoder using other decoding methods. The audio encoder may be an encoder using the encoding method provided by the embodiment of the present application, or may be an encoder using other encoding methods. In the case where the audio decoder is a decoder using the decoding method provided by the embodiment of the present application, the audio encoder is a coder using other encoding methods, and in the case where the audio decoder is a decoder using other decoding methods, the audio The encoder is an encoder using the encoding method provided by the embodiment of the present application.
第一种情况,音频解码器为利用本申请实施例提供的解码方法的解码器,音频编码器为利用其他编码方法的编码器。In the first case, the audio decoder is a decoder using the decoding method provided by the embodiment of the present application, and the audio encoder is an encoder using other encoding methods.
此时,信道解码模块用于对接收的码流进行信道解码,然后音频解码器用于利用本申请实施例提供的解码方法进行信源解码,再通过音频编码器按照其他编码方法进行编码,实现一种格式到另一种格式的转换,即转码。之后,再通过信道编码后发送。At this time, the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use the decoding method provided by the embodiment of the application to perform source decoding, and then the audio encoder is used to encode according to other encoding methods to achieve a The conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
第二种情况,音频解码器为利用其他解码方法的解码器,音频编码器为利用本申请实施例提供的编码方法的编码器。In the second case, the audio decoder is a decoder using other decoding methods, and the audio encoder is an encoder using the encoding method provided by the embodiment of the present application.
此时,信道解码模块用于对接收的码流进行信道解码,然后音频解码器用于利用其他解码方法进行信源解码,再通过音频编码器利用本申请实施例提供的编码方法进行编码,实现一种格式到另一种格式的转换,即转码。之后,再通过信道编码后发送。At this time, the channel decoding module is used to perform channel decoding on the received code stream, and then the audio decoder is used to use other decoding methods to perform source decoding, and then the audio encoder uses the encoding method provided by the embodiment of the application to perform encoding to realize a The conversion from one format to another is known as transcoding. After that, it is sent after channel coding.
其中,无线设备可以为无线接入点、无线路由器、无线连接器等等。核心网设备可以为移动性管理实体、网关等等。Wherein, the wireless device may be a wireless access point, a wireless router, a wireless connector, and the like. A core network device may be a mobility management entity, a gateway, and the like.
本领域技术人员应能理解上述无线设备或者核心网设备仅为举例,其他现有的或今后可能出现的无线或核心网设备如可适用于本申请实施例,也应包含在本申请实施例保护范围以内,并在此以引用方式包含于此。Those skilled in the art should understand that the above-mentioned wireless devices or core network devices are only examples, and other existing or future wireless or core network devices that are applicable to this embodiment of the application should also be included in the protection of this embodiment of the application. scope, and is hereby incorporated by reference.
请参考图4,图4是本申请实施例提供的一种编解码方法应用于广播电视场景的实施环境的示意图。广播电视场景分为直播场景和后期制作场景。对于直播场景来说,该实施环境包括直播节目三维声制作模块、三维声编码模块、机顶盒和扬声器组,机顶盒包括三维声解码模块。对于后期制作场景来说,该实施环境包括后期节目三维声制作模块、三维声编码模块、网络接收器、移动终端、耳机等。Please refer to FIG. 4 . FIG. 4 is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a broadcast television scene. The broadcast TV scene is divided into a live scene and a post-production scene. For the live broadcast scene, the implementation environment includes a live program 3D sound production module, a 3D sound encoding module, a set-top box and a speaker group, and the set-top box includes a 3D sound decoding module. For post-production scenarios, the implementation environment includes post-program 3D sound production modules, 3D sound coding modules, network receivers, mobile terminals, earphones, and the like.
直播场景下,直播节目三维声制作模块制作出三维声信号(如HOA信号),该三维声信号经过应用本申请实施例的编码方法得到码流,该码流经广电网络传输到用户侧,由机顶盒中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。或者,该码流经互联网传输到用户侧,由网络接收器中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。又或者,该码流经互联网传输到用户侧,由移动终端中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由耳机进行回放。In the live broadcast scene, the three-dimensional sound production module of the live program produces a three-dimensional sound signal (such as an HOA signal), and the three-dimensional sound signal obtains a code stream by applying the encoding method of the embodiment of the application, and the code stream is transmitted to the user side through the radio and television network, and the The 3D sound decoder in the set-top box uses the decoding method provided by the embodiment of the present application to decode the code stream, thereby reconstructing the 3D sound signal, which is played back by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
后期制作场景下,后期节目三维声制作模块制作出三维声信号,该三维声信号经过应用本申请实施例的编码方法得到码流,该码流经广电网络传输到用户侧,由机顶盒中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。或者,该码流经互联网传输到用户侧,由网络接收器中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由扬声器组进行回放。又或者,该码流经互联网传输到用户侧,由移动终端中的三维声解码器利用本申请实施例提供的解码方法对码流进行解码,从而重建三维声信号,由耳机进行回放。In the post-production scene, the post-program 3D sound production module produces a 3D sound signal, and the 3D sound signal obtains a code stream by applying the encoding method of the embodiment of the application. The acoustic decoder uses the decoding method provided by the embodiment of the present application to decode the code stream, so as to reconstruct the three-dimensional acoustic signal, which is played back by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the network receiver decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back by the speaker group. Alternatively, the code stream is transmitted to the user side via the Internet, and the 3D sound decoder in the mobile terminal decodes the code stream using the decoding method provided by the embodiment of the present application, thereby reconstructing the 3D sound signal and playing it back through the earphone.
请参考图5,图5是本申请实施例提供的一种编解码方法应用于虚拟现实流场景的实施环境的示意图。该实施环境包括编码端和解码端,编码端包括采集模块、预处理模块、编码模块、打包模块和发送模块,解码端包括解包模块、解码模块、渲染模块和耳机。Please refer to FIG. 5 , which is a schematic diagram of an implementation environment in which a codec method provided by an embodiment of the present application is applied to a virtual reality streaming scene. The implementation environment includes an encoding end and a decoding end. The encoding end includes an acquisition module, a preprocessing module, an encoding module, a packaging module and a sending module, and the decoding end includes an unpacking module, a decoding module, a rendering module and earphones.
采集模块采集HOA信号,然后通过预处理模块对HOA信号进行预处理操作,预处理操作包括滤除掉HOA信号中的低频部分,通常是以20Hz或者50Hz为分界点,提取HOA信号中的方位信息等。之后通过编码模块,利用本申请实施例提供的编码方法进行编码处理,编码之后通过打包模块进行打包,进而通过发送模块发送给解码端。The acquisition module collects the HOA signal, and then preprocesses the HOA signal through the preprocessing module. The preprocessing operation includes filtering out the low frequency part of the HOA signal, usually using 20Hz or 50Hz as the cut-off point to extract the orientation information in the HOA signal wait. Then use the encoding module to perform encoding processing using the encoding method provided by the embodiment of the present application. After encoding, use the packing module to pack and send to the decoding end through the sending module.
解码端的解包模块首先进行解包,之后通过解码模块,利用本申请实施例提供的解码方法进行解码,然后通过渲染模块对解码信号进行双耳渲染处理,渲染处理后的信号映射到收听者耳机上。该耳机可以为独立的耳机,也可以是基于虚拟现实的眼镜设备上的耳机。The unpacking module at the decoding end first unpacks, and then uses the decoding method provided by the embodiment of the application to decode through the decoding module, and then performs binaural rendering processing on the decoded signal through the rendering module, and the rendered signal is mapped to the listener's earphones superior. The earphone can be an independent earphone, or an earphone on a virtual reality glasses device.
需要说明的是,本申请实施例描述的***架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着***架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。It should be noted that the system architecture and business scenarios described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute limitations on the technical solutions provided by the embodiments of the present application. Those of ordinary skill in the art It can be seen that with the evolution of the system architecture and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.
接下来对本申请实施例提供的编解码方法进行详细地解释说明。需要说明的是,结合图1所示的实施环境,下文中的任一种编码方法可以是源装置10中的编码器100执行的。下文中的任一种解码方法可以是目的地装置20中的解码器200执行的。Next, the codec method provided by the embodiment of the present application is explained in detail. It should be noted that, in combination with the implementation environment shown in FIG. 1 , any of the following encoding methods may be executed by the encoder 100 in the source device 10 . Any of the following decoding methods may be performed by the decoder 200 in the destination device 20 .
图6是本申请实施例提供的一种编码方法的流程图,该编码方法应用于编码端。请参考图6,该方法包括如下步骤。FIG. 6 is a flow chart of an encoding method provided by an embodiment of the present application, and the encoding method is applied to an encoding end. Please refer to FIG. 6 , the method includes the following steps.
步骤601:根据当前帧的HOA信号确定当前帧的编码方案。Step 601: Determine the coding scheme of the current frame according to the HOA signal of the current frame.
对于待编码的多个音频帧的HOA信号来说,编码端逐帧进行编码。其中,音频帧的HOA信号是通过HOA采集技术得到的音频信号。HOA信号是一种场景音频信号,也是一种三维 音频信号,HOA信号是指对空间中麦克风所在位置的声场进行采集得到的音频信号,采集得到的音频信号称为原始HOA信号。音频帧的HOA信号也可以是将其他格式的三维音频信号转换后获得的HOA信号。例如将5.1声道信号转换成HOA信号,或者将5.1声道信号和对象音频混合的三维音频信号转换成HOA信号。可选地,待编码的音频帧的HOA信号为时域信号或频域信号,可以包含HOA信号的所有通道,也可以包含HOA信号的部分通道。示例性地,若音频帧的HOA信号的阶数为3,HOA信号的通道数为16,音频帧的帧长为20ms,采样率为48KHz,则待编码的音频帧的HOA信号包含16个通道的信号,每个通道包含960个采样点。For the HOA signals of multiple audio frames to be encoded, the encoder performs encoding frame by frame. Wherein, the HOA signal of the audio frame is an audio signal obtained through the HOA acquisition technology. The HOA signal is a scene audio signal and also a three-dimensional audio signal. The HOA signal refers to the audio signal obtained by collecting the sound field where the microphone is located in the space. The collected audio signal is called the original HOA signal. The HOA signal of the audio frame may also be an HOA signal obtained by converting a 3D audio signal in another format. For example, convert a 5.1-channel signal into an HOA signal, or convert a 3D audio signal mixed with a 5.1-channel signal and object audio into an HOA signal. Optionally, the HOA signal of the audio frame to be encoded is a time-domain signal or a frequency-domain signal, and may include all channels of the HOA signal, or may include some channels of the HOA signal. Exemplarily, if the order of the HOA signal of the audio frame is 3, the number of channels of the HOA signal is 16, the frame length of the audio frame is 20ms, and the sampling rate is 48KHz, then the HOA signal of the audio frame to be encoded contains 16 channels The signal, each channel contains 960 sampling points.
为了降低计算复杂度,若编码端获取到的音频帧的HOA信号为原始HOA信号,原始HOA信号的采样点数或频点数较多,那么编码端可以对原始HOA信号进行下采样,以得到待编码的音频帧的HOA信号。例如,编码端对原始HOA信号进行1/Q下采样,以降低待编码的HOA信号的采样点数或频点数,如本申请实施例中原始HOA信号的每个通道包含960个采样点,采用1/120下采样后,得到待编码的HOA信号的每个通道包含8个采样点。In order to reduce computational complexity, if the HOA signal of the audio frame obtained by the encoder is the original HOA signal, and the number of sampling points or frequency points of the original HOA signal is large, the encoder can down-sample the original HOA signal to obtain the The HOA signal of the audio frame. For example, the encoder performs 1/Q down-sampling on the original HOA signal to reduce the number of sampling points or frequency points of the HOA signal to be encoded. For example, in the embodiment of the present application, each channel of the original HOA signal contains 960 sampling points. After /120 downsampling, each channel of the HOA signal to be encoded contains 8 sampling points.
在本申请实施例中以编码端对当前帧进行编码为例,对编码端的编码方法进行介绍。当前帧为待编码的一个音频帧。也即是,编码端获取当前帧的HOA信号,采用本申请实施例提供的编码方法对当前帧的HOA信号进行编码。In this embodiment of the present application, the encoding method of the encoding end is introduced by taking the encoding end encoding the current frame as an example. The current frame is an audio frame to be encoded. That is, the encoding end acquires the HOA signal of the current frame, and encodes the HOA signal of the current frame by using the encoding method provided in the embodiment of the present application.
需要说明的是,为了满足对不同声场类型下的音频帧均有较高的压缩率,需要根据各音频帧的声场类型为相应音频帧选择合适的编解码方案。在本申请实施例中,编码端先根据当前帧的HOA信号确定当前帧的初始编码方案,初始编码方案为第一编码方案或第二编码方案。编码端再通过对比当前帧的初始编码方案和当前帧的前一帧的初始编码方案是否相同,来判定采用第一编码方案、第二编码方案还是第三编码方案对当前帧的HOA信号进行编码。其中,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则编码端采用与当前帧的初始编码方案相一致的编码方案来编码当前帧的HOA信号。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,则编码端采用切换帧编码方案来编码当前帧的HOA信号。It should be noted that, in order to satisfy higher compression rates for audio frames under different sound field types, it is necessary to select an appropriate codec scheme for the corresponding audio frame according to the sound field type of each audio frame. In the embodiment of the present application, the encoding end first determines the initial encoding scheme of the current frame according to the HOA signal of the current frame, and the initial encoding scheme is the first encoding scheme or the second encoding scheme. The encoding end judges whether the first encoding scheme, the second encoding scheme or the third encoding scheme is used to encode the HOA signal of the current frame by comparing the initial encoding scheme of the current frame with the initial encoding scheme of the previous frame of the current frame. . Wherein, if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, the encoding end uses the encoding scheme consistent with the initial encoding scheme of the current frame to encode the HOA signal of the current frame. If the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, the encoding end uses the switching frame coding scheme to encode the HOA signal of the current frame.
在本申请实施例中,当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种。其中,第一编码方案为基于DirAC的HOA编码方案,第二编码方案为基于虚拟扬声器选择的HOA编码方案,第三编码方案为混合编码方案。可选地,混合编码方案也称为切换帧编码方案。第三编码方案为本申请实施例提供的一种切换帧编码方案,第三编码方案为了在不同的编解码方案之间切换时听觉质量的平滑过渡。本申请实施例将会在下文对这三种编码方案进行详细介绍。在本申请实施例中,基于虚拟扬声器选择的HOA编码方案也称为基于MP的HOA编码方案。In the embodiment of the present application, the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme. Wherein, the first coding scheme is a DirAC-based HOA coding scheme, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme. Optionally, the hybrid coding scheme is also referred to as a switched frame coding scheme. The third coding scheme is a switching frame coding scheme provided by the embodiment of the present application, and the third coding scheme is for smooth transition of auditory quality when switching between different codec schemes. The embodiment of the present application will introduce these three encoding schemes in detail below. In the embodiment of the present application, the HOA coding scheme based on virtual speaker selection is also referred to as the MP-based HOA coding scheme.
在本申请实施例中,编码端根据当前帧的HOA信号确定当前帧的初始编码方案。然后,编码端基于当前帧的初始编码方案和当前帧的前一帧的初始编码方案,确定当前帧的编码方案。需要说明的是,本申请实施例不限定编码端确定初始编码方案的实现方式。In the embodiment of the present application, the coding end determines the initial coding scheme of the current frame according to the HOA signal of the current frame. Then, the encoding end determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame. It should be noted that this embodiment of the present application does not limit the implementation manner in which the encoding end determines the initial encoding scheme.
可选地,编码端对当前帧的HOA信号进行声场类型分析,以得到当前帧的声场分类结果,基于当前帧的声场分类结果,确定当前帧的初始编码方案。需要说明的是,本申请实施例不限定声场类型分析的方法,例如编码端通过对当前帧的HOA信号进行奇异值分解以进行声场类型分析,或者对该HOA信号进行其他的线性分解以进行声场类型分析。Optionally, the coding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame, and determines the initial coding scheme of the current frame based on the sound field classification result of the current frame. It should be noted that the embodiment of the present application does not limit the method of sound field type analysis, for example, the encoding end performs singular value decomposition on the HOA signal of the current frame to perform sound field type analysis, or performs other linear decomposition on the HOA signal to perform sound field analysis. type analysis.
可选地,声场分类结果包括相异性声源数量。以编码端对当前帧的HOA信号直接进行声场类型分析为例,编码端对当前帧的HOA信号进行声场类型分析,以得到当前帧的声场分类结果的一种实现方式为:编码端对当前帧的HOA信号进行奇异值分解,得到M个奇异值。编码端计算该M个奇异值中的第i个奇异值与第i+1个奇异值的比值,以得到M-1个声场分类参数。其中,i=1,2,…,M。编码端基于该M-1个声场分类参数,确定当前帧对应的相异性声源数量。其中,M=min(L,K),L表示当前帧的HOA信号的通道数量,K表示当前帧的HOA信号的每个通道的信号点数,min表示取最小值运算。若HOA信号为时域信号,则信号点数为采样点数,若HOA信号为频域信号,则信号点数为频点数。Optionally, the sound field classification result includes the number of distinct sound sources. Taking the encoding end directly analyzing the sound field type of the HOA signal of the current frame as an example, the encoding end analyzes the sound field type of the HOA signal of the current frame to obtain the sound field classification result of the current frame: the encoding end analyzes the current frame Singular value decomposition is performed on the HOA signal to obtain M singular values. The encoding end calculates the ratio of the i-th singular value to the i+1-th singular value among the M singular values, so as to obtain M-1 sound field classification parameters. Wherein, i=1,2,...,M. The encoding end determines the number of different sound sources corresponding to the current frame based on the M-1 sound field classification parameters. Wherein, M=min(L,K), L represents the number of channels of the HOA signal of the current frame, K represents the number of signal points of each channel of the HOA signal of the current frame, and min represents the minimum value operation. If the HOA signal is a time-domain signal, the number of signal points is the number of sampling points, and if the HOA signal is a signal in the frequency domain, the number of signal points is the number of frequency points.
可选地,假设该M-1个声场类型参数为temp[i],i=0,1,…,M-2,编码端基于该M-1个声场分类参数,确定当前帧对应的相异性声源数量的一种实现方式为:从i=0开始依次执行如下流程:判断temp[i]是否大于预设的相异性声源判定阈值,若本轮流程中temp[i]小于该相异性声源判定阈值,则更新i的取值为i+1,继续执行下轮流程,若本轮流程中temp[i]大于或等于该相异性声源判定阈值,则确定当前帧对应的相异性声源数量等于i+1,结束流程。可选地,相异性声源判定阈值为30、80或100等,相异性声源判定阈值为预设的值,可以根据经验或通过统计进行预设。Optionally, assuming that the M-1 sound field type parameters are temp[i], i=0,1,...,M-2, the encoding end determines the corresponding dissimilarity of the current frame based on the M-1 sound field classification parameters One way to implement the number of sound sources is to execute the following process sequentially starting from i=0: judge whether temp[i] is greater than the preset threshold for judging different sound sources, and if temp[i] is smaller than the different Sound source determination threshold, then update the value of i to i+1, continue to execute the next round of process, if temp[i] in this round of process is greater than or equal to the heterogeneity sound source determination threshold, then determine the dissimilarity corresponding to the current frame The number of sound sources is equal to i+1, and the process ends. Optionally, the different sound source determination threshold is 30, 80 or 100, etc., and the different sound source determination threshold is a preset value, which can be preset based on experience or statistics.
相应地,在一种实现方式中,在确定当前帧对应的相异性声源数量之后,若当前帧对应的相异性声源数量大于第一阈值且小于第二阈值,则编码端确定当前帧的初始编码方案为第二编码方案。若当前帧对应的相异性声源数量不大于第一阈值或不小于第二阈值,则编码端确定当前帧的初始编码方案为第一编码方案。其中,第一阈值小于第二阈值。可选地,第一阈值为0或其他值,第二阈值为3或其他值。前述第一阈值、第二阈值为预设的值,可以根据经验或通过统计进行预设。Correspondingly, in an implementation manner, after determining the number of dissimilar sound sources corresponding to the current frame, if the number of dissimilar sound sources corresponding to the current frame is greater than the first threshold and less than the second threshold, the encoder determines the number of dissimilar sound sources of the current frame The initial encoding scheme is the second encoding scheme. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the initial encoding scheme of the current frame is the first encoding scheme. Wherein, the first threshold is smaller than the second threshold. Optionally, the first threshold is 0 or other values, and the second threshold is 3 or other values. The aforementioned first threshold and second threshold are preset values, which can be preset based on experience or through statistics.
示例性地,假设当前帧的HOA信号的通道数量L=16,每个通道的频点数K=8,min(L,K)=8。那么,编码端对当前帧的HOA信号进行奇异值分解,得到奇异值v[i],i=0,1,…,min(L,K)-1。编码端计算相邻奇异值之间的比值,将得到的比值作为当前帧的声场分类结果temp[i],temp[i]=v[i]/v[i+1],i=0,1,…,min(L,K)-2。假设相异性声源判定阈值为100,确定相异性声源数量n的过程如下:从i=0开始,判断temp[i]是否大于或等于100,若temp[i]大于或等于100,即满足temp[i]≥100,则停止判断;否则i=i+1,继续判断。若停止判断,则停止判断时的序号i加上1等于当前帧对应的相异性声源数量n。例如,i=0时,若temp[0]≥100,则停止判断,相异性声源数量n等于1;否则令i=1,继续判断i=1;当i=1时,temp[1]≥100,则停止判断,相异性声源数量n等于i+1=2。假设第一阈值为0,第二阈值为3,则若当前帧对应的相异性声源数量n满足0<n<3,则编码端确定当前帧的初始编码方案为第二编码方案。若当前帧对应的相异性声源数量n满足n=0或n≥3,则编码端确定当前帧的初始编码方案为第一编码方案。Exemplarily, it is assumed that the number of channels of the HOA signal of the current frame is L=16, the number of frequency points of each channel is K=8, and min(L, K)=8. Then, the encoder performs singular value decomposition on the HOA signal of the current frame to obtain a singular value v[i], i=0,1,...,min(L,K)-1. The encoding end calculates the ratio between adjacent singular values, and uses the obtained ratio as the sound field classification result temp[i] of the current frame, temp[i]=v[i]/v[i+1], i=0,1 ,...,min(L,K)-2. Assuming that the judgment threshold of dissimilar sound sources is 100, the process of determining the number n of dissimilar sound sources is as follows: starting from i=0, judge whether temp[i] is greater than or equal to 100, if temp[i] is greater than or equal to 100, that is, satisfy If temp[i]≥100, stop judging; otherwise, i=i+1, continue judging. If the judgment is stopped, adding 1 to the sequence number i when the judgment is stopped is equal to the number n of different sound sources corresponding to the current frame. For example, when i=0, if temp[0]≥100, then stop judging, and the number n of different sound sources is equal to 1; otherwise let i=1, continue judging i=1; when i=1, temp[1] ≥100, the judgment is stopped, and the number n of different sound sources is equal to i+1=2. Assuming that the first threshold is 0 and the second threshold is 3, if the number n of dissimilar sound sources corresponding to the current frame satisfies 0<n<3, the encoder determines that the initial encoding scheme of the current frame is the second encoding scheme. If the number n of dissimilarity sound sources corresponding to the current frame satisfies n=0 or n≧3, the encoder determines that the initial encoding scheme of the current frame is the first encoding scheme.
可选地,声场分类结果包括声场类型,声场类型分为弥散性声场和相异性声场。声场类型可以根据前述方法得到的相异性声源数量来确定,即,编码端基于当前帧对应的相异性声源数量确定当前帧的声场类型。例如,若当前帧对应的相异性声源数量大于第一阈值且小于第二阈值,则编码端确定当前帧的声场类型为相异性声场。若当前帧对应的相异性声源数量不大于第一阈值或不小于第二阈值,则编码端确定当前帧的声场类型为弥散性声场。相应地,若当前帧的声场类型为相异性声场,则编码端确定当前帧的初始编码方案为第二编码方案, 即基于MP的HOA编码方案。若当前帧的声场类型为弥散性声场类型,则编码端确定当前帧的初始编码方案为第一编码方案,即基于DirAC的HOA编码方案。Optionally, the sound field classification result includes sound field types, and the sound field types are divided into diffuse sound fields and heterogeneous sound fields. The sound field type may be determined according to the number of distinct sound sources obtained by the foregoing method, that is, the encoder determines the sound field type of the current frame based on the number of distinct sound sources corresponding to the current frame. For example, if the number of distinct sound sources corresponding to the current frame is greater than the first threshold and smaller than the second threshold, the encoder determines that the sound field type of the current frame is a distinct sound field. If the number of dissimilar sound sources corresponding to the current frame is not greater than the first threshold or not less than the second threshold, the encoder determines that the sound field type of the current frame is a diffuse sound field. Correspondingly, if the sound field type of the current frame is a heterogeneous sound field, the encoder determines that the initial encoding scheme of the current frame is the second encoding scheme, that is, the MP-based HOA encoding scheme. If the sound field type of the current frame is a diffuse sound field type, the encoding end determines that the initial encoding scheme of the current frame is the first encoding scheme, that is, the HOA encoding scheme based on DirAC.
在一些实施例中,通过上述实现方式确定各个音频帧(包括当前帧)的初始编码方案之后,可能会出现各个音频帧的初始编码方案来回切换的情况,也即最终需要编码的切换帧较多。由于编码方案之间的切换带来的问题较多,即需要解决的问题较多,那么可以通过减少切换帧的数量来减少切换带来的问题。为了减少切换帧的数量,编码端可以先根据当前帧的声场分类结果,确定当前帧的预计编码方案,即编码端将按照前述方法确定的初始编码方案作为预计编码方案。然后,编码端采用滑动窗的方法基于预计编码方案更新当前帧的初始编码方案,如编码端通过hangover处理来更新当前帧的初始编码方案。In some embodiments, after the initial encoding scheme of each audio frame (including the current frame) is determined through the above implementation, the initial encoding scheme of each audio frame may be switched back and forth, that is, there are more switching frames that need to be encoded in the end . Since there are many problems caused by the switching between encoding schemes, that is, there are many problems to be solved, the problems caused by the switching can be reduced by reducing the number of switching frames. In order to reduce the number of switching frames, the encoding end can first determine the expected encoding scheme of the current frame according to the sound field classification result of the current frame, that is, the encoding end uses the initial encoding scheme determined according to the aforementioned method as the expected encoding scheme. Then, the encoding end uses a sliding window method to update the initial encoding scheme of the current frame based on the expected encoding scheme, for example, the encoding end updates the initial encoding scheme of the current frame through hangover processing.
可选地,假设滑动窗的长度为N,滑动窗内包含当前帧的预计编码方案以及当前帧的前N-1帧的已更新的初始编码方案。若滑动窗内第二编码方案的个数累计不小于第一指定阈值,则编码端将当前帧的初始编码方案更新为第二编码方案。若滑动窗内第二编码方案的个数累计小于第一指定阈值,则编码端将当前帧的初始编码方案更新为第一编码方案。其中,滑动窗的长度N为8、10、15等,第一指定阈值为5、6、7等值,本申请实施例对滑动窗的长度和第一指定阈值的取值不作限定。举例说明如下,假设滑动窗的长度为10,第一指定阈值为7,滑动窗内包含当前帧的预计编码方案以及当前帧的前9帧的已更新的初始编码方案,如果滑动窗内第二编码方案的个数累计到不小于7,则编码端将当前帧的初始编码方案确定为第二编码方案,如果滑动窗内第二编码方案的个数累计小于7,则编码端将当前帧的初始编码方案更新为第一编码方案。Optionally, assuming that the length of the sliding window is N, the sliding window includes the predicted coding scheme of the current frame and the updated initial coding scheme of the previous N−1 frames of the current frame. If the cumulative number of second coding schemes in the sliding window is not less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme. If the cumulative number of second coding schemes in the sliding window is less than the first specified threshold, the encoder updates the initial coding scheme of the current frame to the first coding scheme. Wherein, the length N of the sliding window is 8, 10, 15, etc., and the first specified threshold is 5, 6, 7, etc. The embodiment of the present application does not limit the length of the sliding window and the value of the first specified threshold. An example is as follows, assuming that the length of the sliding window is 10, the first specified threshold is 7, and the sliding window contains the predicted coding scheme of the current frame and the updated initial coding scheme of the first 9 frames of the current frame. If the second When the number of coding schemes accumulates to no less than 7, the encoding end determines the initial encoding scheme of the current frame as the second encoding scheme; if the number of second encoding schemes in the sliding window accumulates to less than 7, the encoding end determines The initial encoding scheme is updated to the first encoding scheme.
或者,若滑动窗内第一编码方案的个数累计不小于第二指定阈值,则编码端将当前帧的初始编码方案更新为第一编码方案。若滑动窗内第一编码方案的个数累计小于第二指定阈值,则编码端将当前帧的初始编码方案更新为第二编码方案。其中,第二指定阈值为5、6、7等值,本申请实施例对第二指定阈值的取值不作限定。可选地,第二指定阈值与上述第一指定阈值不同或相同。Alternatively, if the cumulative number of first coding schemes in the sliding window is not less than the second specified threshold, the encoder updates the initial coding scheme of the current frame to the first coding scheme. If the cumulative number of the first coding scheme in the sliding window is less than the second specified threshold, the encoder updates the initial coding scheme of the current frame to the second coding scheme. Wherein, the second designated threshold value is 5, 6, 7 and other values, and the embodiment of the present application does not limit the value of the second designated threshold value. Optionally, the second specified threshold is different from or the same as the above-mentioned first specified threshold.
除了上述介绍的一些实现方式之外,编码端也可以采用其他的方法来得到当前帧的声场分类结果,基于声场分类结果确定初始编码方案的方法也可以采用其他的方法,本申请实施例对此不作限定。In addition to some implementations described above, the encoder can also use other methods to obtain the sound field classification result of the current frame, and other methods can also be used to determine the initial coding scheme based on the sound field classification result. Not limited.
在本申请实施例中,编码端确定当前帧的初始编码方案之后,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则编码端确定当前帧的编码方案为当前帧的初始编码方案。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,则编码端确定当前帧的编码方案为第三编码方案。也即是,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同且为第一编码方案,则编码端确定当前帧的编码方案为第一编码方案。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同且为第二编码方案,则编码端确定当前帧的编码方案为第二编码方案。若当前帧的初始编码方案与当前帧的前一帧的初始编码方案中的一个为第一编码方案,另一个为第二编码方案,则编码端确定当前帧的编码方案为第三编码方案。其中,当前帧的初始编码方案与当前帧的前一帧的初始编码方案中的一个为第一编码方案,另一个为第二编码方案,即,当前帧的初始编码方案为第一编码方案且当前帧的前一帧的初始编码方案为第二编码方案,或者,当前帧的初始编码方案为第二编码方案且当前帧的前一帧的初始编码方案为第一编码方案。也即是,对于切换帧来说,编码端既 不采用第一编码方案也不采用第二编码方案来编码切换帧的HOA信号,而是将采用切换帧编码方案来编码切换帧的HOA信号。对于非切换帧来说,编码端将采用与非切换帧的初始编码方案相一致的编码方案来编码切换帧的HOA信号。其中,初始编码方案与前一帧的初始编码方案不同的音频帧为切换帧,初始编码方案与前一帧的初始编码方案相同的音频帧为非切换帧。In this embodiment of the application, after the encoding end determines the initial encoding scheme of the current frame, if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame, the encoding end determines that the encoding scheme of the current frame is the current frame initial encoding scheme. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the frame preceding the current frame, the encoder determines that the encoding scheme of the current frame is the third encoding scheme. That is, if the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the first coding scheme, the encoder determines that the coding scheme of the current frame is the first coding scheme. If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame and is the second coding scheme, the encoder determines that the coding scheme of the current frame is the second coding scheme. If one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme, and the other is the second coding scheme, the encoder determines that the coding scheme of the current frame is the third coding scheme. Wherein, one of the initial coding scheme of the current frame and the initial coding scheme of the previous frame of the current frame is the first coding scheme, and the other is the second coding scheme, that is, the initial coding scheme of the current frame is the first coding scheme and The initial encoding scheme of the frame preceding the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and the initial encoding scheme of the frame preceding the current frame is the first encoding scheme. That is, for the switching frame, the encoding end neither adopts the first encoding scheme nor the second encoding scheme to encode the HOA signal of the switching frame, but uses the switching frame encoding scheme to encode the HOA signal of the switching frame. For a non-switching frame, the coding end will use a coding scheme consistent with the initial coding scheme of the non-switching frame to code the HOA signal of the switching frame. Wherein, an audio frame whose initial coding scheme is different from that of the previous frame is a switching frame, and an audio frame whose initial coding scheme is the same as that of the previous frame is a non-switching frame.
需要说明的是,编码端除了确定当前帧的编码方案之外,还需将能够指示当前帧的编码方案的信息编入码流,以便于解码端确定采用哪个解码方案来解码当前帧的码流。在本申请实施例中,编码端将能够指示当前帧的编码方案的信息编入码流的实现方式有多种,接下来介绍其中的三种实现方式。It should be noted that, in addition to determining the encoding scheme of the current frame, the encoding end also needs to encode information that can indicate the encoding scheme of the current frame into the code stream, so that the decoding end can determine which decoding scheme to use to decode the code stream of the current frame . In the embodiment of the present application, there are many ways for the encoding end to encode information capable of indicating the encoding scheme of the current frame into the code stream, and three implementation ways will be introduced next.
第一种实现方式、编码切换标志以及两种编码方案的指示信息 The first implementation , the code switching flag and the indication information of the two coding schemes
在该实现方式中,编码端需要确定当前帧的切换标志的值,将当前帧的切换标志的值编入码流。其中,当当前帧的编码方案为第一编码方案或第二编码方案时,当前帧的切换标志的值为第一值。当当前帧的编码方案为第三编码方案时,当前帧的切换标志的值为第二值。可选地,第一值为“0”,第二值为“1”,第一值和第二值也可以为其他的值。In this implementation, the encoder needs to determine the value of the switching flag of the current frame, and encode the value of the switching flag of the current frame into the code stream. Wherein, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value. When the coding scheme of the current frame is the third coding scheme, the value of the switching flag of the current frame is the second value. Optionally, the first value is "0" and the second value is "1", and the first value and the second value may also be other values.
另外,编码端将当前帧的初始编码方案的指示信息编入码流。或者,若当前帧的切换标志的值为第一值,则编码端将当前帧的初始编码方案的指示信息编入码流,若当前帧的切换标志的值为第二值,则编码端将预设指示信息编入码流。In addition, the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream. Alternatively, if the value of the switching flag of the current frame is the first value, the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream; if the value of the switching flag of the current frame is the second value, the encoding end encodes the Preset instructions are programmed into the bitstream.
可选地,初始编码方案的指示信息以与初始编码方案相对应的编码模式(coding mode)来表示,即,以编码模式作为指示信息。例如,与初始编码方案相对应的编码模式为初始编码模式,初始编码模式为第一编码模式(即DirAC模式)或第二编码模式(即MP模式)。可选地,预设指示信息为预设编码模式,预设编码模式为第一编码模式或第二编码模式。在其他一些实施例中,预设指示信息为其他编码模式,也即不限定编入码流的切换帧的编码方案的指示信息具体是什么。Optionally, the indication information of the initial coding scheme is represented by a coding mode (coding mode) corresponding to the initial coding scheme, that is, the coding mode is used as the indication information. For example, the encoding mode corresponding to the initial encoding scheme is the initial encoding mode, and the initial encoding mode is the first encoding mode (ie, the DirAC mode) or the second encoding mode (ie, the MP mode). Optionally, the preset indication information is a preset encoding mode, and the preset encoding mode is a first encoding mode or a second encoding mode. In some other embodiments, the preset indication information is other coding modes, that is, the specific indication information of the coding scheme of the switching frame encoded into the code stream is not limited.
也即是,在该第一种实现方式中,编码端以切换标志来指示切换帧,且可以不限定编入码流的切换帧的编码方案的指示信息,切换帧的编码方案的指示信息可以为初始编码模式,也可以为预设编码模式,也可以从第一编码模式和第二编码模式中随机选定,也可以是其他的指示信息。需要说明的是,在这种实现方式中,用切换标志来指示当前帧是否为切换帧,这样,解码端即能够直接通过获取码流中的切换标志来确定当前帧是否为切换帧。That is to say, in the first implementation mode, the encoding end uses the switching flag to indicate the switching frame, and the indication information of the coding scheme of the switching frame encoded into the code stream may not be limited, and the indication information of the coding scheme of the switching frame may be It may be an initial encoding mode, may also be a preset encoding mode, may also be randomly selected from the first encoding mode and the second encoding mode, or may be other indication information. It should be noted that in this implementation, the switching flag is used to indicate whether the current frame is a switching frame, so that the decoder can directly determine whether the current frame is a switching frame by obtaining the switching flag in the code stream.
可选地,在该第一种实现方式中,当前帧的切换标志和初始编码方案的指示信息各占码流的一个比特位。示例性地,当前帧的切换标志的值为“0”或“1”,其中,切换标志的值为“0”指示当前帧不是切换帧,即当前帧的切换标志的值为第一值。切换标志为“1”指示当前帧是切换帧,即当前帧的切换标志的值为第二值。可选地,初始编码方案的指示信息为“0”或“1”,其中,“0”表示DirAC模式(即DirAC编码方案),“1”表示MP模式(即基于MP的编码方案)。Optionally, in the first implementation manner, the switching flag of the current frame and the indication information of the initial coding scheme each occupy one bit of the code stream. Exemplarily, the value of the switching flag of the current frame is "0" or "1", wherein the value of the switching flag is "0" indicating that the current frame is not a switching frame, that is, the value of the switching flag of the current frame is the first value. The switching flag being "1" indicates that the current frame is a switching frame, that is, the value of the switching flag of the current frame is the second value. Optionally, the indication information of the initial encoding scheme is "0" or "1", wherein "0" indicates the DirAC mode (ie, the DirAC encoding scheme), and "1" indicates the MP mode (ie, the MP-based encoding scheme).
在其他一些实施例中,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,则编码端确定当前帧的切换标志的值为第二值,将当前帧的切换标志的值编入码流。也即是,对于切换帧来说,由于码流中切换标志即能够指示切换帧,因此无需编码切换帧的编码方案的指示信息。In some other embodiments, if the initial encoding scheme of the current frame is different from the initial encoding scheme of the previous frame of the current frame, the encoding end determines that the value of the switching flag of the current frame is the second value, and sets the value of the switching flag of the current frame to The value is encoded into the codestream. That is, for the switching frame, since the switching flag in the code stream can indicate the switching frame, there is no need to encode the indication information of the coding scheme of the switching frame.
第二种实现方式、编码两种编码方案的指示信息 The second implementation mode , encoding the indication information of the two encoding schemes
在该实现方式中,编码端将当前帧的初始编码方案的指示信息编入码流。以编码模式作为指示信息为例,编入码流的指示信息实质上是与初始编码方案相一致的编码模式,即初始编码模式,初始编码模式为第一编码模式或第二编码模式。另外,编码端可以不编码切换标志。In this implementation manner, the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream. Taking the coding mode as the indication information as an example, the indication information encoded into the code stream is substantially the coding mode consistent with the initial coding scheme, that is, the initial coding mode, and the initial coding mode is the first coding mode or the second coding mode. In addition, the encoding end may not encode the switching flag.
可选地,在该第一种实现方式中,初始编码方案的指示信息占码流的一个比特位。示例性地,以编码模式作为指示信息为例,编入码流的编码模式为“0”或“1”,其中,“0”表示DirAC模式,指示当前帧的初始编码方案为第一编码方案,“1”表示MP模式,指示当前帧的初始编码方案为第二编码方案。Optionally, in the first implementation manner, the indication information of the initial encoding scheme occupies one bit of the code stream. Exemplarily, taking the coding mode as the indication information, the coding mode coded into the code stream is "0" or "1", where "0" indicates the DirAC mode, indicating that the initial coding scheme of the current frame is the first coding scheme , "1" indicates MP mode, indicating that the initial encoding scheme of the current frame is the second encoding scheme.
第三种实现方式、编码三种编码方案的指示信息 The third implementation mode , encoding the indication information of the three encoding schemes
在该实现方式中,编码端将当前帧的编码方案的指示信息编入码流。以编码模式作为指示信息为例,编入码流的指示信息实质上是与当前帧的编码方案相一致的编码模式,与当前帧的编码方案相一致的编码模式为实际编码模式,实际编码模式即第一编码模式、第二编码模式或第三编码模式。可选地,第三编码模式为MP-W模式。In this implementation manner, the encoding end encodes the indication information of the encoding scheme of the current frame into the code stream. Taking the coding mode as the instruction information as an example, the instruction information encoded into the code stream is essentially the coding mode consistent with the coding scheme of the current frame, and the coding mode consistent with the coding scheme of the current frame is the actual coding mode, and the actual coding mode That is, the first coding mode, the second coding mode or the third coding mode. Optionally, the third encoding mode is MP-W mode.
可选地,在该第三种实现方式中,当前帧的编码方案的指示信息占码流的两个比特位。示例性地,当前帧的编码方案的指示信息为“00”、“01”或“10”。其中,“00”指示当前帧的编码方案为第一编码方案,“01”指示当前帧的编码方案为第二编码方案,“10”指示当前帧的编码方案为第三编码方案。Optionally, in the third implementation manner, the indication information of the coding scheme of the current frame occupies two bits of the code stream. Exemplarily, the indication information of the coding scheme of the current frame is "00", "01" or "10". Wherein, "00" indicates that the encoding scheme of the current frame is the first encoding scheme, "01" indicates that the encoding scheme of the current frame is the second encoding scheme, and "10" indicates that the encoding scheme of the current frame is the third encoding scheme.
由上述可知,在上述第一种实现方式中,编码端确定当前帧的初始编码方案之后,确定切换标志的值,将切换标志的值编入码流。另外,将当前帧的初始编码方案的指示信息编入码流,或者,若当前帧为切换帧,则编码端将预设指示信息编入码流,若当前帧为非切换帧,则编码端将当前帧的初始编码方案的指示信息编入码流。在上述第二种实现方式中,编码端确定当前帧的初始编码方案之后,直接将当前帧的初始编码方案的指示信息编入码流。在上述第三种实现方式中,编码端确定当前帧的初始编码方案之后,基于当前帧的初始编码方案和当前帧的前一帧的初始编码方案,确定当前帧的编码方案,将当前帧的编码方案的指示信息编入码流。It can be seen from the above that, in the above first implementation manner, after determining the initial encoding scheme of the current frame, the encoding end determines the value of the switching flag, and encodes the value of the switching flag into the code stream. In addition, the instruction information of the initial encoding scheme of the current frame is encoded into the code stream, or, if the current frame is a switching frame, the encoder encodes the preset instruction information into the code stream, and if the current frame is a non-switching frame, the encoding end Encode the indication information of the initial coding scheme of the current frame into the code stream. In the second implementation manner above, after determining the initial encoding scheme of the current frame, the encoder directly encodes the indication information of the initial encoding scheme of the current frame into the code stream. In the third implementation above, after the encoding end determines the initial encoding scheme of the current frame, it determines the encoding scheme of the current frame based on the initial encoding scheme of the current frame and the initial encoding scheme of the previous frame of the current frame, and converts the encoding scheme of the current frame to Instructions for encoding schemes are encoded into the bitstream.
步骤602:若当前帧的编码方案为第三编码方案,则将该HOA信号中指定通道的信号编入码流,指定通道为该HOA信号的所有通道中的部分通道。Step 602: If the coding scheme of the current frame is the third coding scheme, code the signal of the designated channel in the HOA signal into the code stream, and the designated channel is a part of all the channels of the HOA signal.
在本申请实施例中,若当前帧的编码方案为第三编码方案,表示当前帧为切换帧,则编码端按照第三编码方案(即混合编码方案)对当前帧的HOA信号进行编码。对应于上述步骤601中的第一种实现方式,若当前帧的切换标志的值为第二值,表示当前帧为切换帧。对应于上述步骤601中的第二种实现方式,若当前帧的初始编码方案与当前帧的前一帧的初始编码方案不同,表示当前帧为切换帧。对应于上述步骤601中的第三种实现方式,若当前帧的编码方案为第三编码方案,则当前帧的编码方案指示当前帧为切换帧。对于切换帧来说,编码端采用第三编码方案来编码当前帧的HOA信号。其中,第三编码方案指示将当前帧的HOA信号中指定通道的信号编入码流,其中,指定通道为该HOA信号的所有通道中的部分通道。也即是,对于切换帧来说,编码端将切换帧的HOA信号中指定通道的信号编入码流,而非采用第一编码方案或第二编码方案对切换帧进行编码,即本方案为了编码方案切换时听觉质量的平滑过渡,采用一种折中的方式来编码切换帧。In the embodiment of the present application, if the encoding scheme of the current frame is the third encoding scheme, which means that the current frame is a switching frame, the encoding end encodes the HOA signal of the current frame according to the third encoding scheme (ie, the hybrid encoding scheme). Corresponding to the first implementation manner in step 601 above, if the value of the switching flag of the current frame is the second value, it indicates that the current frame is a switching frame. Corresponding to the second implementation in step 601 above, if the initial coding scheme of the current frame is different from the initial coding scheme of the previous frame of the current frame, it means that the current frame is a switching frame. Corresponding to the third implementation manner in step 601 above, if the coding scheme of the current frame is the third coding scheme, the coding scheme of the current frame indicates that the current frame is a switching frame. For the switching frame, the encoding end adopts the third encoding scheme to encode the HOA signal of the current frame. Wherein, the third coding scheme indicates to code the signal of the specified channel in the HOA signal of the current frame into the code stream, wherein the specified channel is a part of all channels of the HOA signal. That is to say, for the switching frame, the encoder encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream instead of using the first coding scheme or the second coding scheme to encode the switching frame, that is, this scheme is for Smooth transition of auditory quality when coding schemes are switched, using a compromise method to encode switching frames.
可选地,指定通道与第一编码方案中预设的传输通道一致,即指定通道为预设通道。也 即是,在第三编码方案与第二编码方案不同的前提下,为了使得第三编码方案与第二编码方案的编码效果相接近,编码端将切换帧的HOA信号中与第一编码方案中预设的传输通道相同的通道的信号编入码流,从而使得听觉质量尽可能地平滑过渡。需要说明的是,根据编码带宽、码率的不同,甚至是应用场景的不同,可以分别预设不同的传输通道。可选地,不同的编码带宽、码率或应用场景下,预设的传输通道也可以相同。Optionally, the designated channel is consistent with a preset transmission channel in the first encoding scheme, that is, the designated channel is a preset channel. That is to say, under the premise that the third coding scheme is different from the second coding scheme, in order to make the coding effect of the third coding scheme and the second coding scheme close, the coding end will switch the HOA signal of the frame and the first coding scheme The signal of the same channel as the preset transmission channel is encoded into the code stream, so that the auditory quality can be as smooth as possible. It should be noted that different transmission channels can be preset according to different encoding bandwidths, bit rates, and even application scenarios. Optionally, under different encoding bandwidths, bit rates or application scenarios, the preset transmission channels may also be the same.
可选地,指定通道的信号包括FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。也即是,指定通道包括FOA通道,FOA通道的信号为低阶信号,即,若当前帧为切换帧,则编码端将当前帧的HOA信号的低阶部分编入码流,低阶部分即包括FOA通道的W信号、X信号、Y信号和Z信号。Optionally, the signals of the specified channel include FOA signals, and the FOA signals include omnidirectional W signals, and directional X signals, Y signals, and Z signals. That is to say, the specified channel includes the FOA channel, and the signal of the FOA channel is a low-order signal, that is, if the current frame is a switching frame, the encoding end encodes the low-order part of the HOA signal of the current frame into the code stream, and the low-order part is Including W signal, X signal, Y signal and Z signal of FOA channel.
需要说明的是,在本申请实施例中,编码端将HOA信号中指定通道的信号编入码流的实现方式有很多,能将指定通道的信号编入码流即可。接下来介绍其中的一些实现方式。It should be noted that, in the embodiment of the present application, there are many ways for the encoding end to encode the signal of the specified channel in the HOA signal into the code stream, and it only needs to encode the signal of the specified channel into the code stream. Some of these implementations are described next.
在本申请实施例中,若指定通道包括FOA通道,则编码端基于W信号、X信号、Y信号和Z信号,确定虚拟扬声器信号和残差信号,将虚拟扬声器信号和残差信号编入码流。In the embodiment of the present application, if the designated channel includes the FOA channel, the encoding end determines the virtual speaker signal and the residual signal based on the W signal, X signal, Y signal, and Z signal, and encodes the virtual speaker signal and the residual signal into the code flow.
可选地,编码端将W信号确定为一路虚拟扬声器信号,基于W信号、X信号、Y信号和Z信号确定三路残差信号,或者,将X信号、Y信号和Z信号确定为三路残差信号。可选地,编码端将W信号、X信号、Y信号和Z信号中任意三路信号与剩余一路信号之间的差信号确定为三路残差信号。例如,编码端将X信号、Y信号和Z信号分别与W信号之间的差信号确定为三路残差信号。示例性地,编码端将X-W、Y-W、Z-W分别得到的差信号X’、Y’、Z’作为三路残差信号。Optionally, the encoder determines the W signal as one virtual speaker signal, determines three residual signals based on the W signal, X signal, Y signal and Z signal, or determines the X signal, Y signal and Z signal as three channels residual signal. Optionally, the encoding end determines the difference signal between any three signals of the W signal, the X signal, the Y signal, and the Z signal and the remaining signal as the three residual signals. For example, the encoding end determines the difference signals between the X signal, the Y signal, and the Z signal and the W signal as three residual signals. Exemplarily, the encoding end uses the difference signals X', Y', and Z' respectively obtained by X-W, Y-W, and Z-W as three-way residual signals.
若编码端使用核心编码器对当前帧进行编码,核心编码器为立体声编码器,由于所确定的一路虚拟扬声器信号和三路残差信号都是单声道信号,因此,编码端需要先基于这些单声道信号组合出立体声信号,进而使用立体声编码器进行编码。可选地,编码端将该一路虚拟扬声器信号与第一路预设单声道信号组合,以得到一路立体声信号,将该三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号。编码端通过立体声编码器将得到的三路立体声信号分别编入码流。If the encoder uses the core encoder to encode the current frame, and the core encoder is a stereo encoder, since the determined one-way virtual speaker signal and three-way residual signals are all mono signals, the encoder needs to first base on these Mono signals are combined to form stereo signals, which are then encoded using a stereo encoder. Optionally, the encoding end combines the one virtual speaker signal with the first preset mono signal to obtain a stereo signal, and combines the three residual signals with the second preset mono signal to obtain Get two stereo signals. The encoding end encodes the obtained three-way stereo signals into code streams respectively through a stereo encoder.
其中,本申请实施例不限定编码端将该三路残差信号与一路预设单声道信号组合,以得到两路立体声信号的具体组合方式。可选地,编码端将该三路残差信号中相关性最高的两路残差信号组合,以得到该两路立体声信号中的一路立体声信号,将该三路残差信号中除相关性最高的两路残差信号之外的一路残差信号与第二路预设单声道信号组合,以得到该两路立体声信号中的另一路立体声信号。也即是,编码端根据信号的相关性来组合得到立体声信号。在其他一些实施例中,编码端也可以将该三路残差信号中的任意两路残差信号组合,以得到这两路立体声信号中的一路立体声信号,将剩余一路残差信号与第二路预设单声道信号组合,以得到该两路立体声信号中的另一路立体声信号。Wherein, the embodiment of the present application does not limit the encoding end to combine the three residual signals and one preset mono signal to obtain a specific combination method of two stereo signals. Optionally, the encoding end combines the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals, and divides the three residual signals by dividing One residual signal other than the two residual signals is combined with the second preset mono signal to obtain another stereo signal among the two stereo signals. That is to say, the encoding end combines signals according to correlation to obtain stereo signals. In some other embodiments, the encoding end may also combine any two residual signals of the three residual signals to obtain one stereo signal among the two stereo signals, and combine the remaining one residual signal with the second Combine the preset mono signals to obtain the other stereo signal of the two stereo signals.
可选地,本申请实施例中的第一路预设单声道信号为全零信号或全一信号,第二路预设单声道信号为全零信号或全一信号。可选地,第一路预设单声道信号与第二路预设单声道信号相同或不同,即,第一路预设单声道信号与第二路预设单声道信号均为全零信号或全一信号,或者,第一路预设单声道信号为全零信号且第二路预设单声道信号为全一信号,或者,第一路预设单声道信号为全一信号且第二路预设单声道信号为全零信号。其中,全零信号包括采样点的值均为零的信号或者频点的值均为零的信号,全一信号包括采样点的值均为一的 信号或者频点的值均为一的信号。其中,若HOA信号为时域信号,则全零信号包括采样点的值均为零的信号,全一信号包括采样点的值均为一的信号。若HOA信号为频域信号,则全零信号包括频点的值均为零的信号,全一信号包括频点的值均为一的信号。在其他一些实施例中,第一路预设单声道信号和/或第二路预设单声道信号也可以是预设的其他形式的信号。Optionally, the first preset monophonic signal in the embodiment of the present application is an all-zero signal or an all-ones signal, and the second preset monophonic signal is an all-zero signal or an all-ones signal. Optionally, the first preset mono signal is the same as or different from the second preset mono signal, that is, the first preset mono signal and the second preset mono signal are both All zeros or all ones, or, the first preset mono signal is all zeros and the second preset mono signal is all ones, or, the first preset mono signal is All ones and the second preset mono signal is all zeros. Wherein, the all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point value is all zero, and the all-one signal includes a signal whose sampling point value is all one or a signal whose frequency point value is all one. Wherein, if the HOA signal is a time-domain signal, the all-zero signal includes a signal whose sampling point values are all zero, and the all-ones signal includes a signal whose sampling point value is all one. If the HOA signal is a frequency-domain signal, the all-zero signal includes a signal whose frequency point values are all zero, and the all-ones signal includes a signal whose frequency point value is all one. In some other embodiments, the first preset mono signal and/or the second preset mono signal may also be preset signals in other forms.
若编码端使用的核心编码器为单声道编码器,则编码端通过单声道编码器将该一路虚拟扬声器信号、以及该三路残差信号中的各路残差信号分别编入码流。If the core encoder used by the encoding end is a mono encoder, the encoding end uses the mono encoder to encode the one virtual speaker signal and each residual signal of the three residual signals into the code stream respectively .
图7是本申请实施例提供的一种切换帧编码方案的示意图。请参考图7,待编码的当前帧为切换帧,编码端获取当前帧的HOA信号,将该HOA信号中的W信号作为虚拟扬声器信号,根据该HOA信号中的FOA信号确定残差信号,如根据该HOA信号中的X、Y、Z信号确定残差信号,或根据W信号和X、Y、Z信号确定残差信号。编码端通过核心编码器将所确定的虚拟扬声器信号和残差信号编入码流,以得到切换帧的码流。Fig. 7 is a schematic diagram of a switching frame coding scheme provided by an embodiment of the present application. Please refer to Figure 7, the current frame to be encoded is a switching frame, the encoding end obtains the HOA signal of the current frame, uses the W signal in the HOA signal as a virtual speaker signal, and determines the residual signal according to the FOA signal in the HOA signal, as shown in The residual signal is determined according to the X, Y, and Z signals in the HOA signal, or the residual signal is determined according to the W signal and the X, Y, and Z signals. The encoding end encodes the determined virtual speaker signal and residual signal into the code stream through the core encoder, so as to obtain the code stream of the switching frame.
可选地,在其他实施例中,编码端将W信号、X信号、Y信号和Z信号中的两路信号确定为两路虚拟扬声器信号,将剩余两路信号确定为两路残差信号。编码端将该两路虚拟扬声器信号组合,以得到一路立体声信号,将该两路残差信号组合,以得到另一路立体声信号。编码端通过立体声编码器将得到的两路立体声信号分别编入码流。Optionally, in other embodiments, the encoding end determines two channels of signals among W signal, X signal, Y signal and Z signal as two channels of virtual speaker signals, and determines the remaining two channels of signals as two channels of residual signals. The encoding end combines the two channels of virtual speaker signals to obtain one channel of stereo signals, and combines the two channels of residual signals to obtain another channel of stereo signals. The encoding end encodes the obtained two-way stereo signals into code streams respectively through a stereo encoder.
其中,本申请实施例不限定编码端将W信号、X信号、Y信号和Z信号进行两两组合以得到两路立体声信号的具体组合方式。可选地,编码端将W信号确定为一路虚拟扬声器信号,将X信号、Y信号和Z信号中与W信号相关性最高的一路信号确定为另一路虚拟扬声器信号,也即将FOA通道包括的四路信号中的W信号以及与W信号相关性最高的一个信号进行组合,将剩余两路信号进行组合。或者,编码端将W信号、X信号、Y信号和Z信号中的任意两路信号进行组合,以得到一路立体声信号,将剩余两路信号进行组合,以得到另一路立体声信号。Wherein, the embodiment of the present application does not limit the specific combination manner in which the encoding end combines the W signal, the X signal, the Y signal, and the Z signal in pairs to obtain two stereo signals. Optionally, the encoding end determines the W signal as a virtual speaker signal, and determines the signal of the highest correlation with the W signal among the X signal, Y signal, and Z signal as another virtual speaker signal, that is, the four channels included in the FOA channel. Combine the W signal and the signal with the highest correlation with the W signal among the two signals, and combine the remaining two signals. Alternatively, the encoding end combines any two signals of W signal, X signal, Y signal and Z signal to obtain one stereo signal, and combines the remaining two signals to obtain another stereo signal.
需要说明的是,本申请实施例不限定编码端采用核心编码器编码虚拟扬声器信号和残差信号的具体实现方式,例如不限定虚拟扬声器信号和残差信号分别对应的编码比特数等。It should be noted that the embodiment of the present application does not limit the specific implementation manner in which the encoding end uses the core encoder to encode the virtual speaker signal and the residual signal, for example, does not limit the number of encoding bits corresponding to the virtual speaker signal and the residual signal.
以上介绍了当前帧为切换帧的情况下,编码端对当前帧编码的过程,也即编码端按照第三编码方案将切换帧的HOA信号中指定通道的信号编入码流,第三编码方案即切换帧编码方案。由上述可知,在本申请实施例中,指定通道的信号可以包括W信号,W信号是HOA信号的一个核心信号,这样,切换帧编码方案也可称为基于MP-W的编码方案。接下来介绍在当前帧为非切换帧的情况下,编码端对当前帧编码的过程。The above describes the process of encoding the current frame at the encoding end when the current frame is a switching frame, that is, the encoding end encodes the signal of the specified channel in the HOA signal of the switching frame into the code stream according to the third encoding scheme. The third encoding scheme That is, switch the frame encoding scheme. It can be seen from the above that, in the embodiment of the present application, the signal of the specified channel may include the W signal, which is a core signal of the HOA signal. In this way, the switching frame coding scheme can also be called an MP-W-based coding scheme. Next, when the current frame is a non-switching frame, the process of encoding the current frame at the encoding end will be introduced.
在本申请实施例中,若当前帧的编码方案为第一编码方案,则编码端按照第一编码方案将当前帧的HOA信号编入码流。若当前帧的编码方案为第二编码方案,则编码端按照第二编码方案将当前帧的HOA信号编入码流。也即是,若当前帧不是切换帧,则编码端采用当前帧的初始编码方案来编码当前帧。In the embodiment of the present application, if the encoding scheme of the current frame is the first encoding scheme, the encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme. If the encoding scheme of the current frame is the second encoding scheme, the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme. That is, if the current frame is not a switching frame, the encoding end uses the initial encoding scheme of the current frame to encode the current frame.
示例性地,参见图8,编码端按照第二编码方案将当前帧的HOA信号编入码流的实现过程为:编码端基于MP算法从虚拟扬声器集合中选择与当前帧的HOA信号匹配的目标虚拟扬声器,基于当前帧的HOA信号和目标虚拟扬声器,通过基于MP的空间编码器确定虚拟扬声器信号,基于当前帧的HOA信号和虚拟扬声器信号通过基于MP的空间编码器确定残差信号,通过核心编码器将虚拟扬声器信号和残差信号编入码流。需要说明的是,基于MP的HOA编码方案与切换帧编码方案中确定虚拟扬声器信号和残差信号的原理和具体方式不同,且两个 方案所确定的虚拟扬声器信号和残差信号也不同。对于同一帧来说,采用基于MP的HOA编码方案编入码流的有效信息会多于采用切换帧编码方案。而本方案在切换帧编码方案与第二编码方案不同的前提下,为了使得切换帧编码方案与第二编码方案的编码效果相接近,切换帧编码方案也是将虚拟扬声器信号和残差信号编入码流,从而使得听觉质量尽可能地平滑过渡。For example, referring to FIG. 8 , the encoding end encodes the HOA signal of the current frame into the code stream according to the second encoding scheme: the encoding end selects a target that matches the HOA signal of the current frame from the virtual speaker set based on the MP algorithm. Virtual speaker, based on the HOA signal of the current frame and the target virtual speaker, determine the virtual speaker signal through the MP-based spatial encoder, determine the residual signal based on the HOA signal of the current frame and the virtual speaker signal through the MP-based spatial encoder, through the core The encoder encodes the virtual loudspeaker signal and the residual signal into the bitstream. It should be noted that the principles and specific methods of determining the virtual loudspeaker signal and residual signal in the MP-based HOA coding scheme are different from those in the switching frame coding scheme, and the virtual loudspeaker signal and residual signal determined by the two schemes are also different. For the same frame, the effective information encoded into the code stream by using the MP-based HOA coding scheme will be more than that by switching the frame coding scheme. In this scheme, under the premise that the switching frame coding scheme is different from the second coding scheme, in order to make the coding effect of the switching frame coding scheme and the second coding scheme close, the switching frame coding scheme also compiles the virtual speaker signal and the residual signal into stream, so that the auditory quality transitions as smoothly as possible.
编码端按照第一编码方案将当前帧的HOA信号编入码流的实现过程为:编码端从当前帧的HOA信号中提取核心层信号和空间参数,将提取的核心层信号和空间参数编入码流。示例性地,参见图9,编码端通过核心编码信号获取模块从当前帧的HOA信号中提取核心层信号,通过基于DirAC的空间参数提取模块从当前帧的HOA信号中提取出空间参数,通过核心编码器将核心层信号编入码流,通过空间参数编码器将空间参数编入码流。其中,核心层信号对应的通道与本方案中的指定通道一致。另外,采用第一编码方案除了将核心层信号编入码流之外,还将提取的空间参数编入码流,空间参数包含丰富的场景信息,例如方向信息等。可见,对于同一帧来说,采用基于DirAC的HOA编码方案编入码流的有效信息也会多于采用切换帧编码方案编入码流的有效信息,而本方案在切换帧编码方案与第一编码方案不同的前提下,为了使得切换帧编码方案与第一编码方案的编码效果相接近,切换帧编码方案也是将HOA信号中与第一编码方案所预设的传输通道的信号编入码流,但不会将HOA信号中除指定通道的信号之外更多的信息编入码流,也即不会提取空间参数,更不会将空间参数编入码流,从而使得听觉质量尽可能地平滑过渡。The encoding end encodes the HOA signal of the current frame into the code stream according to the first encoding scheme: the encoding end extracts the core layer signal and spatial parameters from the HOA signal of the current frame, and encodes the extracted core layer signal and spatial parameters into stream. Exemplarily, referring to FIG. 9, the encoding end extracts the core layer signal from the HOA signal of the current frame through the core encoded signal acquisition module, extracts the spatial parameters from the HOA signal of the current frame through the DirAC-based spatial parameter extraction module, and extracts the spatial parameters from the HOA signal of the current frame through the core The encoder encodes the core layer signal into the bit stream, and the spatial parameter into the bit stream through the spatial parameter encoder. Wherein, the channel corresponding to the core layer signal is consistent with the specified channel in this solution. In addition, using the first encoding scheme, in addition to encoding the core layer signal into the code stream, the extracted spatial parameters are also encoded into the code stream. The spatial parameters include rich scene information, such as direction information. It can be seen that, for the same frame, the effective information encoded into the code stream by using the DirAC-based HOA coding scheme will be more than the effective information encoded into the code stream by using the switching frame coding scheme. Under the premise of different coding schemes, in order to make the coding effect of the switching frame coding scheme close to that of the first coding scheme, the switching frame coding scheme also encodes the signal of the transmission channel preset by the first coding scheme in the HOA signal into the code stream , but it will not encode more information in the HOA signal except the signal of the specified channel into the code stream, that is, the spatial parameters will not be extracted, and the spatial parameters will not be encoded into the code stream, so that the auditory quality is as good as possible Smooth transition.
图10是本申请实施例提供的另一种编码方法的流程图。请参考图10,以将当前帧的初始编码方案的指示信息编入码流为例,对本申请实施例提供的编码方法再次进行解释说明。编码端首先获取待编码的当前帧的HOA信号。然后,编码端对该HOA信号进行声场类型分析,以确定当前帧的初始编码方案,编码端将当前帧的初始编码方案的指示信息编入码流。编码端判断当前帧的初始编码方案与前一帧的初始编码方案是否相同。若当前帧的初始编码方案与前一帧的初始编码方案相同,则编码端采用当前帧的初始编码方案对当前帧的HOA信号进行编码,以得到当前帧的码流。若当前帧的初始编码方案与前一帧的初始编码方案不同,则编码端采用切换帧编码方案对当前帧的HOA信号进行编码,以得到当前帧的码流。FIG. 10 is a flow chart of another encoding method provided by the embodiment of the present application. Referring to FIG. 10 , taking the example of encoding the indication information of the initial encoding scheme of the current frame into the code stream, the encoding method provided by the embodiment of the present application is explained again. The encoder first acquires the HOA signal of the current frame to be encoded. Then, the encoding end analyzes the sound field type of the HOA signal to determine the initial encoding scheme of the current frame, and the encoding end encodes the indication information of the initial encoding scheme of the current frame into the code stream. The encoder determines whether the initial encoding scheme of the current frame is the same as that of the previous frame. If the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame, the encoding end uses the initial encoding scheme of the current frame to encode the HOA signal of the current frame to obtain the code stream of the current frame. If the initial encoding scheme of the current frame is different from the initial encoding scheme of the previous frame, the encoding end uses the switching frame encoding scheme to encode the HOA signal of the current frame to obtain the code stream of the current frame.
需要说明的是,若当前帧为待编码的第一个音频帧,则当前帧的初始编码方案为第一编码方案或第二编码方案,编码端采用当前帧的初始编码方案将当前帧的HOA信号编入码流。It should be noted that, if the current frame is the first audio frame to be encoded, the initial encoding scheme of the current frame is the first encoding scheme or the second encoding scheme, and the encoder adopts the initial encoding scheme of the current frame to convert the HOA of the current frame The signal is encoded into the bitstream.
综上所述,在本申请实施例中,结合两个方案(即基于虚拟扬声器选择的编解码方案和基于方向音频编码的编解码方案)对音频帧的HOA信号进行编解码,也即针对不同的音频帧选择合适的编解码方案,这样能够提升音频信号的压缩率。同时,为了使得在不同编解码方案之间切换时听觉质量的平滑过渡,本方案中对于某些音频帧来说,并非直接采用上述两个方案中的任一个方案进行编码,而是采用一种新的编解码方案来编解码这些音频帧,即将这些音频帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案进行编解码,从而使得对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。To sum up, in the embodiment of the present application, the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different The audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal. At the same time, in order to make the smooth transition of auditory quality when switching between different codec schemes, for some audio frames in this scheme, either of the above two schemes is not directly used for encoding, but one of the above two schemes is used. A new codec scheme is used to code and decode these audio frames, that is, to encode the signal of the specified channel in the HOA signal of these audio frames into the code stream, that is, to use a compromise scheme for codec, so that the HOA signal recovered by decoding The aural quality after rendered playback is smooth.
图11是本申请实施例提供的一种解码方法的流程图,该方法应用于解码端。需要说明的是,该解码方法对应于图6所示的编码方法。请参考图11,该方法包括如下步骤。FIG. 11 is a flow chart of a decoding method provided by an embodiment of the present application, and the method is applied to a decoding end. It should be noted that this decoding method corresponds to the encoding method shown in FIG. 6 . Please refer to FIG. 11 , the method includes the following steps.
步骤1101:基于码流获得当前帧的解码方案。Step 1101: Obtain the decoding scheme of the current frame based on the code stream.
其中,当前帧的解码方案为第一解码方案、第二解码方案和第三解码方案中的一种。第一解码方案为基于DirAC的HOA解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案。可选地,混合解码方案也称为切换帧解码方案。Wherein, the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme. The first decoding scheme is an HOA decoding scheme based on DirAC, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme. Optionally, the hybrid decoding scheme is also referred to as a switching frame decoding scheme.
需要说明的是,由于编码端对不同的音频帧采用不同的编码方案进行编码,那么解码端也需要用对应的解码方案来解码各个音频帧。It should be noted that since the encoding end uses different encoding schemes for encoding different audio frames, the decoding end also needs to use a corresponding decoding scheme to decode each audio frame.
接下来首先介绍解码端如何确定当前帧的编码方案。由前述可知,在图6所示编码方法的步骤601中介绍了编码端将能够用于指示当前帧的编码方案的信息编入码流的三种实现方式,相应地,解码端确定当前帧的编码方案也对应有三种实现方式,接下来将对此进行介绍。Next, how to determine the encoding scheme of the current frame at the decoding end will be introduced first. As can be seen from the foregoing, in step 601 of the encoding method shown in FIG. 6 , three implementations are introduced in which the encoding end encodes information that can be used to indicate the encoding scheme of the current frame into the code stream. Correspondingly, the decoding end determines the current frame's There are also three corresponding implementation methods for the encoding scheme, which will be introduced next.
第一种实现方式、编码了切换标志以及两种编码方案的指示信息 The first implementation mode , encoding the switching flag and the indication information of the two encoding schemes
解码端先从码流中解析出当前帧的切换标志的值。若该切换标志的值为第一值,则解码端再从该码流中解析出当前帧的解码方案的指示信息,该指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案。若该切换标志为的值为第二值,则解码端确定当前帧的解码方案为第三解码方案。需要说明的是,编码端编入码流的编码方案的指示信息即为解码端从码流中解析出的解码方案的指示信息。The decoder first parses out the value of the switching flag of the current frame from the code stream. If the value of the switching flag is the first value, the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or the second decoding scheme. decoding scheme. If the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is the third decoding scheme. It should be noted that the indication information of the encoding scheme encoded into the code stream by the encoding end is the indication information of the decoding scheme parsed from the code stream by the decoding end.
换句话说,若解码端解析出当前帧的切换标志的值为第一值,说明当前帧为非切换帧。解码端再从码流中解析出解码方案的指示信息,基于指示信息确定当前帧的解码方案。若解码端解析出当前帧的切换标志的值为第二值,说明当前帧为切换帧,即使码流中包含指示信息,解码端也无需解码指示信息。In other words, if the decoding end parses out that the value of the switching flag of the current frame is the first value, it means that the current frame is a non-switching frame. The decoding end then parses out the indication information of the decoding scheme from the code stream, and determines the decoding scheme of the current frame based on the indication information. If the decoding end parses out that the value of the switching flag of the current frame is the second value, it means that the current frame is a switching frame, and even if the code stream contains the indication information, the decoding end does not need to decode the indication information.
需要说明的是,若切换标志的值为第二值,则解码端确定当前帧的解码方案为切换帧解码方案,且当前帧为切换帧,切换帧解码方案是不同于第一解码方案和第二解码方案的解码方案,切换帧解码方案是为了听觉质量的平滑过渡。It should be noted that, if the value of the switching flag is the second value, the decoding end determines that the decoding scheme of the current frame is a switching frame decoding scheme, and the current frame is a switching frame, and the switching frame decoding scheme is different from the first decoding scheme and the second decoding scheme. The decoding scheme of the two-decoding scheme, the switching frame decoding scheme is for smooth transition of auditory quality.
可选地,在该第一种实现方式中,解码方案的指示信息和切换标志各占码流的一个比特位。示例性地,解码端先从码流中解析当前帧的切换标志的值,若解析出的切换标志的值为“0”,即切换标志的值为第一值,则解码端再从码流中解析当前帧的解码方案的指示信息,若解析出的指示信息为“0”,则解码端确定当前帧的解码方案为第一解码方案。若解析出的指示信息为“1”,则解码端确定当前帧的解码方案为第二解码方案。若解析出的切换标志为的值“1”,则解码端确定当前帧的解码方案为切换帧解码方案(第三解码方案)。Optionally, in the first implementation manner, the indication information of the decoding scheme and the switching flag each occupy one bit of the code stream. Exemplarily, the decoder first parses the value of the switching flag of the current frame from the code stream. If the parsed value of the switching flag is "0", that is, the value of the switching flag is the first value, the decoding end then analyzes the value of the switching flag from the code stream. The indication information of the decoding scheme of the current frame is analyzed in the middle, and if the indication information analyzed is "0", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed switching flag is a value of "1", the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme (the third decoding scheme).
第二种实现方式、编码了两种编码方案的指示信息 The second implementation mode encodes the indication information of two encoding schemes
解码端从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案。若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案。若当前帧的初始解码方案与当前帧的前一帧的初始解码方案不同,则确定当前帧的解码方案为第三解码方案,即混合解码方案。其中,当前帧的初始解码方案与当前帧的前一帧的初始解码方案不同是指,当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或者,当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案。也即是,当前帧的初始解码方案与当前帧的前一帧的初始解码方案中的一个为第一解码方案,另一个为第二解码方案。The decoding end parses out the initial decoding scheme of the current frame from the code stream, and the initial decoding scheme is the first decoding scheme or the second decoding scheme. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is the initial decoding scheme of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame, it is determined that the decoding scheme of the current frame is a third decoding scheme, that is, a hybrid decoding scheme. Wherein, the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame of the current frame means that the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme , or, the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme. That is, one of the initial decoding scheme of the current frame and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and the other is the second decoding scheme.
可选地,在该第二种实现方式中,用于指示初始编码方案的指示信息占码流的一个比特位,以编码模式作为指示信息为例,码流中的编码模式占一个比特位。示例性地,解码端从码流中解析当前帧的初始编码方案的指示信息,若解析出的指示信息为“0”,且当前帧的前 一帧的指示信息也为“0”,则解码端确定当前帧的解码方案为第一解码方案。若解析出的指示信息为“1”,且当前帧的前一帧的指示信息也为“1”,则解码端确定当前帧的解码方案为第二解码方案。若解析出的指示信息为“0”且当前帧的前一帧的指示信息为“1”,或者解析出的指示信息为“1”且当前帧的前一帧的指示信息为“0”,则解码端确定当前帧的解码方案为切换帧解码方案。Optionally, in the second implementation manner, the indication information used to indicate the initial encoding scheme occupies one bit of the code stream, and taking the encoding mode as the indication information as an example, the encoding mode in the code stream occupies one bit. Exemplarily, the decoding end parses the indication information of the initial encoding scheme of the current frame from the code stream, if the parsed indication information is "0", and the indication information of the previous frame of the current frame is also "0", then decoding The terminal determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "1" and the indication information of the previous frame of the current frame is also "1", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed indication information is "0" and the indication information of the previous frame of the current frame is "1", or the parsed indication information is "1" and the indication information of the previous frame of the current frame is "0", Then the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
可选地,当前帧的前一帧的初始解码方案的指示信息为缓存的数据。在解码到当前帧时,解码端可以从缓存中获取当前帧的前一帧的初始解码方案的指示信息。Optionally, the indication information of the initial decoding scheme of the previous frame of the current frame is cached data. When the current frame is decoded, the decoding end may acquire the indication information of the initial decoding scheme of the previous frame of the current frame from the cache.
第三种实现方式、编码了三种编码方案的指示信息 The third implementation method encodes the indication information of three encoding schemes
解码端从码流中解析出当前帧的解码方案的指示信息,该指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案。The decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
可选地,在该第三种实现方式中,解码方案的指示信息占码流的两个比特位。例如,假设以编码模式作为指示信息,当前帧的编码模式占码流的两个比特位。Optionally, in the third implementation manner, the indication information of the decoding scheme occupies two bits of the code stream. For example, assuming that the coding mode is used as the indication information, the coding mode of the current frame occupies two bits of the code stream.
示例性地,解码端从码流中解析当前帧的解码方案的指示信息,若解析出的指示信息为“00”,则解码端确定当前帧的解码方案为第一解码方案。若解析出的指示信息为“01”,则解码端确定当前帧的解码方案为第二解码方案。若解析出的指示信息为“10”,则解码端确定当前帧的解码方案为切换帧解码方案。Exemplarily, the decoding end parses the indication information of the decoding scheme of the current frame from the code stream, and if the parsed indication information is "00", the decoding end determines that the decoding scheme of the current frame is the first decoding scheme. If the parsed indication information is "01", the decoding end determines that the decoding scheme of the current frame is the second decoding scheme. If the parsed indication information is "10", the decoding end determines that the decoding scheme of the current frame is the switching frame decoding scheme.
步骤1102:若当前帧的解码方案为第三解码方案,则基于码流确定当前帧的HOA信号中指定通道的信号,指定通道为HOA信号的所有通道中的部分通道。Step 1102: If the decoding scheme of the current frame is the third decoding scheme, determine the signal of the specified channel in the HOA signal of the current frame based on the code stream, and the specified channel is a part of all channels of the HOA signal.
在本申请实施例中,解码端获得当前帧的解码方案之后,若当前帧的解码方案为第三编码方案,表示当前帧为切换帧,则解码端基于码流确定当前帧的HOA信号中指定通道的信号。也即是,对于切换帧来说,编码端是将指定通道的信号编入码流,那么解码端采用切换帧解码方案来解码切换帧,即需要先从码流中解析出指定通道的信号。In the embodiment of the present application, after the decoding end obtains the decoding scheme of the current frame, if the decoding scheme of the current frame is the third encoding scheme, indicating that the current frame is a switching frame, the decoding end determines the current frame based on the code stream specified in the HOA signal channel signal. That is to say, for the switching frame, the encoding end encodes the signal of the specified channel into the code stream, and then the decoding end uses the switching frame decoding scheme to decode the switching frame, that is, the signal of the specified channel needs to be parsed from the code stream first.
接下来对解码端采用切换帧解码方案解码切换帧的实现过程进行详细介绍,也即详细介绍在当前帧为切换帧的情况下,解码端基于码流确定当前帧的HOA信号中指定通道的信号的实现过程。Next, the implementation process of decoding the switching frame using the switching frame decoding scheme at the decoding end is introduced in detail, that is, when the current frame is a switching frame, the decoding end determines the signal of the specified channel in the HOA signal of the current frame based on the code stream realization process.
需要说明的是,解码端基于码流确定当前帧的HOA信号中指定通道的信号的过程,与编码端将当前帧的HOA信号中指定通道的信号编入码流的过程是对称的。在前述编码方法的实施例中介绍了将该指定通道的信号编入码流的一些实现过程,在解码端将介绍与这些实现过程相对称的解码过程。It should be noted that the process of the decoding end determining the signal of the specified channel in the HOA signal of the current frame based on the code stream is symmetrical to the process of encoding the signal of the specified channel in the HOA signal of the current frame into the code stream at the encoding end. In the foregoing embodiments of the encoding method, some implementation processes of encoding the signal of the specified channel into the code stream are introduced, and the decoding process corresponding to these implementation processes will be introduced at the decoding end.
在本申请实施例中,若编码端是先基于该指定通道的信号确定虚拟扬声器信号和残差信号,再将虚拟扬声器信号和残差信号编入码流,那么,相对应地,解码端先基于码流确定虚拟扬声器信号和残差信号,再基于虚拟扬声器信号和残差信号,确定指定通道的信号。In this embodiment of the application, if the encoding end first determines the virtual speaker signal and the residual signal based on the signal of the specified channel, and then encodes the virtual speaker signal and the residual signal into the code stream, then, correspondingly, the decoding end first The virtual speaker signal and the residual signal are determined based on the code stream, and then the signal of the specified channel is determined based on the virtual speaker signal and the residual signal.
可选地,若编码端通过立体声编码器将基于虚拟扬声器信号和残差信号组合得到的三路立体声信号编入了码流,那么,解码端通过立体声解码器对码流进行解码,以得到三路立体声信号,然后基于该三路立体声信号,确定一路虚拟扬声器信号和三路残差信号。可选地,解码端基于该三路立体声信号中的一路立体声信号,确定一路虚拟扬声器信号,基于该三路立体声信号中的另两路立体声信号,确定三路残差信号。也即是,解码端先从码流中解析出这三路立体声信号,再通过拆解这三路立体声信号以得到一路虚拟扬声器信号和三路残差信号。Optionally, if the encoding end encodes the three-way stereo signal based on the combination of the virtual speaker signal and the residual signal into the code stream through the stereo encoder, then the decoding end decodes the code stream through the stereo decoder to obtain three Stereo signals, and then based on the three stereo signals, one virtual speaker signal and three residual signals are determined. Optionally, the decoder determines one virtual speaker signal based on one of the three stereo signals, and determines three residual signals based on the other two of the three stereo signals. That is, the decoder first parses the three stereo signals from the code stream, and then disassembles the three stereo signals to obtain a virtual speaker signal and three residual signals.
示例性地,解码端从码流中解析出三路立体声信号分别为S1、S2和S3,其中S1是由一路虚拟扬声器信号和一路预设单声道信号组合得到,S2是由两路残差信号组合得到,S3是由剩余一路残差信号与一路预设单声道信号组合得到。解码端将S1拆解得到一路虚拟扬声器信号,将S2拆解得到两路残差信号,将S3拆解得到剩余一路残差信号。Exemplarily, the decoding end parses three stereo signals from the code stream as S1, S2, and S3, wherein S1 is obtained by combining a virtual speaker signal and a preset mono signal, and S2 is obtained by combining two residual signals The signals are combined, and S3 is obtained by combining the remaining one residual signal and one preset mono signal. The decoder disassembles S1 to obtain one virtual speaker signal, disassembles S2 to obtain two residual signals, and disassembles S3 to obtain the remaining one residual signal.
可选地,若编码端通过单声道编码器将基于虚拟扬声器信号和残差信号确定的四路单声道信号编入了码流,那么,解码端通过单声道解码器对码流进行解码,以得到一路虚拟扬声器信号和三路残差信号,这四路单声道信号包括该一路虚拟扬声器信号和该三路残差信号。Optionally, if the encoding end encodes the four-way mono signals determined based on the virtual speaker signal and the residual signal into the code stream through the mono encoder, then the decoding end uses the mono decoder to process the code stream decoding to obtain one virtual speaker signal and three residual signals, and the four monophonic signals include the one virtual speaker signal and the three residual signals.
可选地,若该指定通道的信号包括FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号,那么,解码端基于码流确定虚拟扬声器信号和残差信号之后,基于该虚拟扬声器信号,确定W信号。解码端基于残差信号和W信号确定X信号、Y信号和Z信号,或者,解码端基于残差信号确定X信号、Y信号和Z信号。例如,解码端解析出三路残差信号的情况下,将这三路残差信号分别与W信号之和确定为X信号、Y信号和Z信号,或者,将这三路残差信号分别确定为X信号、Y信号和Z信号。其中,若编码端将X信号、Y信号和Z信号分别与W信号之间的差信号确定为三路残差信号,那么解码端将这三路残差信号分别与W信号之和确定为X信号、Y信号和Z信号。若编码端将X信号、Y信号和Z信号确定为三路残差信号,那么解码端将这三路残差信号分别确定为X信号、Y信号和Z信号。即,解码端的解码过程是与编码端的编码过程匹配的。Optionally, if the signal of the specified channel includes an FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals, then the decoding end determines the virtual speaker signal and the residual signal based on the code stream Then, based on the virtual speaker signal, the W signal is determined. The decoding end determines the X signal, the Y signal and the Z signal based on the residual signal and the W signal, or the decoding end determines the X signal, the Y signal and the Z signal based on the residual signal. For example, when the decoding end parses three residual signals, the sum of the three residual signals and the W signal is determined as the X signal, the Y signal, and the Z signal, or the three residual signals are respectively determined as For X signal, Y signal and Z signal. Among them, if the encoding end determines the difference signals between the X signal, the Y signal and the Z signal and the W signal as three residual signals, then the decoding end determines the sum of the three residual signals and the W signal as X signal, Y signal and Z signal. If the encoding end determines the X signal, the Y signal and the Z signal as three residual signals, then the decoding end determines the three residual signals as the X signal, the Y signal and the Z signal respectively. That is, the decoding process at the decoding end matches the encoding process at the encoding end.
若编码端通过立体声编码器将基于虚拟扬声器信号和残差信号确定的两路立体声信号编入了码流,那么,解码端通过立体声解码器对码流进行解码,以得到这两路立体声信号。解码端基于这两路立体声信号中的一路立体声信号确定两路虚拟扬声器信号,基于这两路立体声信号中的另一路立体声信号确定两路残差信号,这两路虚拟扬声器信号和这两路残差信号即包括W信号、X信号、Y信号和Z信号。可选地,若编码端将W信号,以及X信号、Y信号、Z信号中与W信号相关性最高的一路信号确定为两路虚拟扬声器信号,则解码端确定的两路虚拟扬声器信号包括W信号以及X信号、Y信号、Z信号中与W信号相关性最高的一路信号。假设X信号、Y信号、Z信号中与W信号相关性最高的一路信号为X信号,则解码端确定的两路虚拟扬声器信号包括W信号和X信号,解码端确定的两路残差信号包括Y信号和Z信号。If the encoding end encodes the two stereo signals determined based on the virtual speaker signal and the residual signal into the code stream through the stereo encoder, then the decoding end decodes the code stream through the stereo decoder to obtain the two stereo signals. The decoder determines two channels of virtual speaker signals based on one of the two channels of stereo signals, and determines two channels of residual signals based on the other channel of the two channels of stereo signals. The two channels of virtual speaker signals and the two channels of residual signals The difference signal includes W signal, X signal, Y signal and Z signal. Optionally, if the encoding end determines the W signal, and the signal with the highest correlation with the W signal among the X signal, Y signal, and Z signal as two virtual speaker signals, then the two virtual speaker signals determined by the decoding end include W signal and the signal with the highest correlation with W signal among X signal, Y signal and Z signal. Assuming that the signal with the highest correlation with the W signal among the X signal, Y signal, and Z signal is the X signal, the two virtual speaker signals determined by the decoder include the W signal and the X signal, and the two residual signals determined by the decoder include Y signal and Z signal.
步骤1103:基于该指定通道的信号,确定当前帧的HOA信号中除指定通道之外的一个或多个剩余通道的增益。Step 1103: Based on the signal of the designated channel, determine the gain of one or more remaining channels in the HOA signal of the current frame except the designated channel.
在本申请实施例中,解码端基于码流确定当前帧的HOA信号中指定通道的信号之后,基于该指定通道的信号,确定该HOA信号中除指定通道之外的一个或多个剩余通道的增益。In this embodiment of the application, after the decoder determines the signal of the specified channel in the HOA signal of the current frame based on the code stream, based on the signal of the specified channel, it determines the signals of one or more remaining channels in the HOA signal except for the specified channel. gain.
示例性地,假设指定通道为FOA通道,FOA通道可称为低阶通道,FOA通道的信号可称为HOA信号的低阶部分,HOA信号中除指定通道之外的一个或多个剩余通道称为高阶通道,高阶通道的信号可称为HOA信号的高阶部分,那么,解码端即基于该HOA信号的低阶部分,确定该HOA信号的高阶增益,即高阶通道的增益。Exemplarily, assuming that the specified channel is an FOA channel, the FOA channel may be called a low-order channel, the signal of the FOA channel may be called a low-order part of the HOA signal, and one or more remaining channels in the HOA signal other than the specified channel are called The signal of the high-order channel can be called the high-order part of the HOA signal. Then, the decoder determines the high-order gain of the HOA signal based on the low-order part of the HOA signal, that is, the gain of the high-order channel.
可选地,解码端先对该HOA信号中指定通道的信号进行分析滤波处理,以得到经分析滤波的指定通道的信号,基于经分析滤波的指定通道的信号确定该一个或多个剩余通道的增益。例如,假设指定通道的信号为HOA信号的低阶部分,那么解码端先对该HOA信号的低阶部分进行分析滤波处理,以得到经分析滤波的HOA信号的低阶部分,再基于经分析滤波 的HOA信号的低阶部分估计出高阶增益。可选地,本方案中对于切换帧来说,解码端进行分析滤波处理所使用的分析滤波器,与基于DirAC的HOA解码方案中使用的分析滤波器相同,这样能够使得切换帧的解码时延与基于DirAC的HOA解码方案的解码时延一致,即时延对齐。需要说明的是,本文所讲的解码时延为端到端的编解码时延,解码时延也可称为编码时延。Optionally, the decoding end first performs analysis and filtering on the signal of the specified channel in the HOA signal to obtain the signal of the specified channel after analysis and filtering, and determines the signal of the one or more remaining channels based on the signal of the specified channel after analysis and filtering. gain. For example, assuming that the signal of the specified channel is the low-order part of the HOA signal, the decoder first performs analysis and filtering on the low-order part of the HOA signal to obtain the low-order part of the analyzed and filtered HOA signal, and then based on the analysis and filtering The low-order part of the HOA signal estimates the high-order gain. Optionally, for switching frames in this solution, the analysis filter used by the decoding end for analysis and filtering is the same as the analysis filter used in the DirAC-based HOA decoding solution, which can make the decoding delay of the switching frame It is consistent with the decoding delay of the DirAC-based HOA decoding scheme, that is, delay alignment. It should be noted that the decoding delay mentioned in this article refers to the end-to-end codec delay, and the decoding delay may also be referred to as encoding delay.
需要说明的是,在本申请实施例中,解码端基于该指定通道的信号,确定HOA信号中除指定通道之外的一个或多个剩余通道的增益的过程,即基于指定通道的信号估计剩余通道的增益的过程,具体实现方式与基于DirAC的编解码方案中的剩余通道增益估计方法相同,本申请实施例不详细介绍。示例性地,本方案中对于切换帧来说,解码端基于HOA信号的低阶部分估计高阶增益的方法与基于DirAC的编解码方案中的高阶增益估计方法相同。It should be noted that, in this embodiment of the application, the decoding end determines the gain of one or more remaining channels in the HOA signal other than the designated channel based on the signal of the designated channel, that is, estimates the residual gain based on the signal of the designated channel. The specific implementation of the channel gain process is the same as the remaining channel gain estimation method in the DirAC-based codec solution, which is not described in detail in the embodiment of the present application. Exemplarily, for switching frames in this solution, the method for estimating the high-order gain based on the low-order part of the HOA signal at the decoding end is the same as the method for estimating the high-order gain in the codec solution based on DirAC.
步骤1104:基于该指定通道的信号和该一个或多个剩余通道的增益,确定该一个或多个剩余通道中各个剩余通道的信号。Step 1104: Based on the signal of the specified channel and the gain of the one or more remaining channels, determine the signal of each remaining channel in the one or more remaining channels.
在本申请实施例中,解码端基于该指定通道的信号和该一个或多个剩余通道的增益,确定该一个或多个剩余通道中各个剩余通道的信号。示例性地,假设该指定通道的信号为HOA信号中的低阶部分,该一个或多个剩余通道的增益为高阶增益,那么,解码端可以基于该低阶部分中的W信号和高阶增益,确定HOA信号中的高阶部分。或者,若解码端对HOA信号的低阶部分进行了分析滤波处理,那么解码端可以基于经分析滤波的HOA信号的低阶部分中的W信号和高阶增益,确定经分析滤波的HOA信号的高阶部分。In the embodiment of the present application, the decoding end determines the signal of each remaining channel in the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels. Exemplarily, assuming that the signal of the specified channel is the low-order part of the HOA signal, and the gain of the one or more remaining channels is the high-order gain, then the decoding end can base on the W signal in the low-order part and the high-order Gain, which determines the higher order components in the HOA signal. Alternatively, if the decoding end performs analysis and filtering processing on the low-order part of the HOA signal, then the decoding end can determine the HOA signal after analysis and filtering based on the W signal and the high-order gain in the low-order part of the HOA signal after analysis and filtering. advanced part.
步骤1105:基于该指定通道的信号和该一个或多个剩余通道的信号,获得当前帧的重建HOA信号。Step 1105: Obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
在本申请实施例中,解码端在得到指定通道的信号和该一个或多个剩余通道的信号之后,基于该指定通道的信号和该一个或多个剩余通道的信号,获得当前帧的重建HOA信号,即重建当前帧的HOA信号。示例性地,解码端对该指定通道的信号和该一个或多个剩余通道的信号进行合成滤波处理,以获得当前帧的重建HOA信号。例如,假设该指定通道的信号为HOA信号中的低阶部分,该一个或多个剩余通道的信号为HOA信号中的高阶部分,那么解码端对该HOA信号的低阶部分和高阶部分进行合成滤波处理,以获得当前帧的重建HOA信号。或者,若解码端对HOA信号的低阶部分进行了分析滤波处理,那么解码端对经分析滤波的HOA信号的低阶部分和经分析滤波的HOA信号的高阶部分进行合成滤波处理,以获得当前帧的重建HOA信号。可选地,本方案中对于切换帧来说,解码端进行合成滤波处理所使用的合成滤波器,与基于DirAC的HOA编解码方案中使用的合成滤波器相同,这样能够使得切换帧的解码时延与基于DirAC的HOA解码方案的解码时延一致,即时延对齐。In the embodiment of the present application, after obtaining the signal of the specified channel and the signal of the one or more remaining channels, the decoder obtains the reconstructed HOA of the current frame based on the signal of the specified channel and the signal of the one or more remaining channels Signal, that is, to reconstruct the HOA signal of the current frame. Exemplarily, the decoding end performs synthesis filtering processing on the signal of the designated channel and the signals of the one or more remaining channels, so as to obtain the reconstructed HOA signal of the current frame. For example, assuming that the signal of the specified channel is the low-order part of the HOA signal, and the signals of the one or more remaining channels are the high-order part of the HOA signal, then the decoding end can compare the low-order part and the high-order part of the HOA signal Synthetic filtering is performed to obtain the reconstructed HOA signal of the current frame. Alternatively, if the decoding end performs analysis filtering on the low-order part of the HOA signal, then the decoding end performs synthesis filtering on the low-order part of the HOA signal analyzed and filtered and the high-order part of the HOA signal analyzed and filtered to obtain The reconstructed HOA signal for the current frame. Optionally, for switching frames in this solution, the synthesis filter used by the decoding end to perform synthesis filtering processing is the same as the synthesis filter used in the DirAC-based HOA codec scheme, which can make the decoding of the switching frame The delay is consistent with the decoding delay of the DirAC-based HOA decoding scheme, that is, the delay is aligned.
图12是本申请实施例提供的一种切换帧解码方案的示意图。参见图12,待解码的当前帧为切换帧,假设指定通道的信号为HOA信号的低阶部分,那么,在解码过程中,解码端获取待解码的当前帧的码流,通过核心解码器对该码流进行核心解码,以重建出当前帧的HOA信号的低阶部分,采用与基于DirAC的HOA解码方案中确定高阶部分相类似的方法,基于该低阶部分估计出高阶部分,也即重建该HOA信号的高阶部分。之后,解码端基于解码得到的低阶部分和通过估计得到的高阶部分重建出该HOA信号。Fig. 12 is a schematic diagram of a switching frame decoding solution provided by an embodiment of the present application. Referring to Figure 12, the current frame to be decoded is a switching frame, assuming that the signal of the specified channel is the low-order part of the HOA signal, then, during the decoding process, the decoding end obtains the code stream of the current frame to be decoded, and the The core decoding of the code stream is used to reconstruct the low-order part of the HOA signal of the current frame, and a method similar to that of determining the high-order part in the DirAC-based HOA decoding scheme is used to estimate the high-order part based on the low-order part. That is, the higher order part of the HOA signal is reconstructed. Afterwards, the decoding end reconstructs the HOA signal based on the low-order part obtained through decoding and the high-order part obtained through estimation.
以上介绍了当前帧为切换帧的情况下,解码端对当前帧解码的过程,也即解码端采用切换帧解码方案来解码切换帧,即解码端先解码出HOA信号中指定通道的信号(如低阶部分), 再重构出各个剩余通道的信号(如重构高阶部分)。接下来介绍在当前帧为非切换帧的情况下,解码端对当前帧解码的过程。The above describes the process of decoding the current frame at the decoding end when the current frame is a switching frame, that is, the decoding end uses the switching frame decoding scheme to decode the switching frame, that is, the decoding end first decodes the signal of the specified channel in the HOA signal (such as low-order part), and then reconstruct the signal of each remaining channel (such as reconstructing the high-order part). Next, when the current frame is a non-switching frame, the process of decoding the current frame at the decoding end will be introduced.
在本申请实施例中,解码端确定当前帧的解码方案之后,若当前帧的解码方案为第一解码方案,则解码端按照第一解码方案,根据该码流获得当前帧的重建HOA信号。若当前帧的解码方案为第二解码方案,则解码端按照第二解码方案,根据该码流获得当前帧的重建HOA信号。In the embodiment of the present application, after the decoding end determines the decoding scheme of the current frame, if the decoding scheme of the current frame is the first decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme. If the decoding scheme of the current frame is the second decoding scheme, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme.
在本申请实施例中,参见图13,解码端按照第二解码方案,根据该码流获得当前帧的重建HOA信号的实现过程为:解码端通过核心解码器从码流中解析出虚拟扬声器信号和残差信号,将解析出的虚拟扬声器信号和残差信号送入基于MP的空间解码器,以获得当前帧的重建HOA信号。需要说明的是,图13所示的解码方案是与图8所示的编码方案相对应的。In the embodiment of the present application, referring to Fig. 13, the decoding end obtains the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme: the decoding end parses the virtual speaker signal from the code stream through the core decoder and the residual signal, the parsed virtual speaker signal and residual signal are sent to the MP-based spatial decoder to obtain the reconstructed HOA signal of the current frame. It should be noted that the decoding scheme shown in FIG. 13 corresponds to the encoding scheme shown in FIG. 8 .
解码端按照第一解码方案,根据该码流获得当前帧的重建HOA信号的实现过程为:解码端从码流中解析出核心层信号和空间参数,基于核心层信号和空间参数重建出当前帧的HOA信号。示例性地,参见图14,解码端通过核心解码器从码流中解析出核心层信号,通过空间参数解码器从码流中解析出空间参数,基于解析出的核心层信号和空间参数进行基于DirAC的HOA信号合成处理,以获得当前帧的重建HOA信号。需要说明的是,图14所示的解码方案是与图9所示的编码方案相对应的。According to the first decoding scheme, the realization process of obtaining the reconstructed HOA signal of the current frame according to the code stream is as follows: the decoder parses the core layer signal and spatial parameters from the code stream, and reconstructs the current frame based on the core layer signal and spatial parameters HOA signal. Exemplarily, referring to FIG. 14 , the decoding end parses the core layer signal from the code stream through the core decoder, and parses the spatial parameters from the code stream through the spatial parameter decoder, and performs based on the parsed core layer signal and spatial parameters. DirAC's HOA signal synthesis processing to obtain the reconstructed HOA signal of the current frame. It should be noted that the decoding scheme shown in FIG. 14 corresponds to the encoding scheme shown in FIG. 9 .
可选地,由于HOA信号的高阶部分对听觉质量的影响较大,为了进一步使得不同编解码方案之间切换时听觉质量的平滑过渡,解码端在按照第二解码方案,根据码流获得当前帧的重建HOA信号的过程中,还可以对当前帧的高阶部分进行增益调整。例如,解码端按照第二解码方案,根据码流获得初始HOA信号,若当前帧的前一帧的解码方案为第三解码方案,即当前帧的前一帧为切换帧,则解码端根据当前帧的前一帧的高阶增益,对初始HOA信号的高阶部分进行增益调整。然后,解码端基于初始HOA信号的低阶部分和经增益调整的高阶部分,获得当前帧的重建HOA信号。Optionally, since the high-order part of the HOA signal has a greater impact on the auditory quality, in order to further smooth the transition of the auditory quality when switching between different codec schemes, the decoding end obtains the current In the process of reconstructing the HOA signal of the frame, gain adjustment may also be performed on the high-order part of the current frame. For example, the decoding end obtains the initial HOA signal according to the code stream according to the second decoding scheme. Higher-order gain of the previous frame of the frame, which performs gain adjustment on the higher-order part of the initial HOA signal. Then, the decoder obtains the reconstructed HOA signal of the current frame based on the low-order part of the original HOA signal and the high-order part after gain adjustment.
需要说明的是,若当前帧的前一帧为切换帧,则当前帧利用前一帧的高阶增益对当前帧的初始HOA信号的高阶部分进行增益调整,以使当前帧的经增益调整的高阶部分与前一帧的高阶部分相似,如增益调整使得这相邻两帧的HOA信号的高阶部分的能量相近。这样,后续解码端对各个音频帧进行渲染播放的过程中,切换帧的听觉质量,以及切换帧的下一帧的听觉质量均能够很好的平滑过渡。It should be noted that if the previous frame of the current frame is a switching frame, then the current frame uses the high-order gain of the previous frame to perform gain adjustment on the high-order part of the initial HOA signal of the current frame, so that the gain-adjusted The high-order part of is similar to the high-order part of the previous frame, for example, the gain adjustment makes the energy of the high-order part of the HOA signals in two adjacent frames similar. In this way, when the subsequent decoding end renders and plays each audio frame, the auditory quality of the switched frame and the auditory quality of the frame next to the switched frame can transition smoothly.
可选地,除了对切换帧之后的解码方案为第二解码方案的音频帧进行高阶增益调整之外,对于其他的解码方案为第二解码方案的音频帧来说,解码端也可以对这些音频帧的HOA信号的高阶部分进行增益调整,本申请实施例不限定对这些音频帧的HOA信号的高阶部分进行增益调整的具体实现方式。可选地,除了对高阶部分进行增益调整之外,解码端还可以对这些音频帧的HOA信号的其他部分进行增益调整。也即是,本申请实施例不限定对HOA信号的哪些通道的信号进行增益调整。换句话说,解码端可以对HOA信号中任意一个或多个通道的信号进行增益调整,该一个或多个通道可以包括高阶通道中的部分或全部,或除指定通道之外的剩余通道中部分或全部,或其他通道。Optionally, in addition to performing high-order gain adjustments on audio frames whose decoding scheme is the second decoding scheme after frame switching, for other audio frames whose decoding scheme is the second decoding scheme, the decoder can also adjust these The gain adjustment is performed on the high-order part of the HOA signal of the audio frame, and the embodiment of the present application does not limit the specific implementation manner of performing gain adjustment on the high-order part of the HOA signal of these audio frames. Optionally, in addition to performing gain adjustment on the high-order part, the decoding end may also perform gain adjustment on other parts of the HOA signal of these audio frames. That is, the embodiment of the present application does not limit which channel signals of the HOA signal are to be adjusted for gain. In other words, the decoder can adjust the gain of any one or more channels in the HOA signal, and the one or more channels can include part or all of the high-order channels, or the remaining channels except the specified channel Some or all, or other channels.
图15是本申请实施例提供的另一种解码方法的流程图。参见图15,以编码端将初始编码方案的指示信息编入码流为例,且假设码流中未编入切换标志,则在解码过程中,解码端先从码流中解析出当前帧的初始解码方案的指示信息。然后,解码端判断当前帧的初始解码 方案与前一帧的初始解码方案是否相同。若当前帧的初始解码方案与前一帧的初始解码方案相同,说明当前帧为非切换帧,则解码端采用当前帧的初始解码方案对码流进行解码,以获得当前帧的重建HOA信号。若当前帧的初始解码方案与前一帧的初始解码方案不同,说明当前帧为切换帧,则解码端采用切换帧解码方案对码流进行解码,以获得当前帧的重建HOA信号。Fig. 15 is a flow chart of another decoding method provided by the embodiment of the present application. Referring to Figure 15, take the encoding end coding the indication information of the initial encoding scheme into the code stream as an example, and assuming that the switching flag is not encoded in the code stream, then in the decoding process, the decoding end first parses the current frame's information from the code stream Indication of the initial decoding scheme. Then, the decoder judges whether the initial decoding scheme of the current frame is the same as that of the previous frame. If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame, it means that the current frame is a non-switching frame, and the decoder uses the initial decoding scheme of the current frame to decode the code stream to obtain the reconstructed HOA signal of the current frame. If the initial decoding scheme of the current frame is different from the initial decoding scheme of the previous frame, it means that the current frame is a switched frame, and the decoding end uses the switched frame decoding scheme to decode the code stream to obtain the reconstructed HOA signal of the current frame.
综上所述,在本申请实施例中,结合两个方案(即基于虚拟扬声器选择的编解码方案和基于方向音频编码的编解码方案)对音频帧的HOA信号进行编解码,也即针对不同的音频帧选择合适的编解码方案,这样能够提升音频信号的压缩率。同时,为了使得在不同编解码方案之间切换时听觉质量的平滑过渡,本方案中对于某些音频帧来说,并非直接采用上述两个方案中的任一个方案进行编解码,而是采用一种新的编解码方案来编解码这些音频帧,即编码时将这些音频帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案进行编解码,从而使得对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。To sum up, in the embodiment of the present application, the HOA signal of the audio frame is encoded and decoded by combining two schemes (namely, a codec scheme based on virtual speaker selection and a codec scheme based on directional audio coding), that is, for different The audio frame selects an appropriate codec scheme, which can improve the compression rate of the audio signal. At the same time, in order to achieve a smooth transition of auditory quality when switching between different codec schemes, for some audio frames in this scheme, either one of the above two schemes is not directly used for encoding and decoding, but a A new codec scheme is used to encode and decode these audio frames, that is, the signal of the specified channel in the HOA signal of these audio frames is encoded into the code stream during encoding, that is, a compromise scheme is used for encoding and decoding, so that the recovery of decoding The auditory quality after rendering and playing the output HOA signal can be smoothly transitioned.
图16是本申请实施例提供的一种编码装置1600的结构示意图,该编码装置1600可以由软件、硬件或者两者的结合实现成为编码端设备的部分或者全部,该编码端设备可以为前述实施例中的任一编码端设备。参见图16,该装置1600包括:第一确定模块1601和第一编码模块1602。Figure 16 is a schematic structural diagram of an encoding device 1600 provided by an embodiment of the present application. The encoding device 1600 can be implemented by software, hardware, or a combination of the two to become part or all of the encoding end device. The encoding end device can be the aforementioned implementation Any encoding device in the example. Referring to FIG. 16 , the apparatus 1600 includes: a first determination module 1601 and a first encoding module 1602 .
第一确定模块1601,用于根据当前帧的高阶立体混响HOA信号确定当前帧的编码方案,当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种;其中,第一编码方案为基于方向音频编码的HOA编码方案,第二编码方案为基于虚拟扬声器选择的HOA编码方案,第三编码方案为混合编码方案;The first determining module 1601 is configured to determine the coding scheme of the current frame according to the high-order ambisonics HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme ; Wherein, the first coding scheme is an HOA coding scheme based on directional audio coding, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme;
第一编码模块1602,用于若当前帧的编码方案为第三编码方案,则将HOA信号中指定通道的信号编入码流,指定通道为HOA信号的所有通道中的部分通道。The first encoding module 1602 is configured to encode the signal of the specified channel in the HOA signal into the code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is a part of all channels of the HOA signal.
可选地,指定通道的信号包括一阶立体混响FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。Optionally, the signal of the designated channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals.
可选地,第一编码模块1602包括:Optionally, the first encoding module 1602 includes:
第一确定子模块,用于基于W信号、X信号、Y信号和Z信号,确定虚拟扬声器信号和残差信号;The first determination submodule is used to determine the virtual speaker signal and the residual signal based on the W signal, the X signal, the Y signal and the Z signal;
编码子模块,用于将该虚拟扬声器信号和残差信号编入码流。The encoding sub-module is used to encode the virtual loudspeaker signal and the residual signal into a code stream.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
将W信号确定为一路虚拟扬声器信号;Determining the W signal as a virtual loudspeaker signal;
基于W信号、X信号、Y信号和Z信号确定三路残差信号,或者,将X信号、Y信号和Z信号确定为三路残差信号。The three residual signals are determined based on the W signal, the X signal, the Y signal and the Z signal, or the X signal, the Y signal and the Z signal are determined as the three residual signals.
可选地,编码子模块用于:Optionally, the encoding submodule is used to:
将这一路虚拟扬声器信号与第一路预设单声道信号组合,以得到一路立体声信号;Combine this virtual speaker signal with the first preset mono signal to get a stereo signal;
将这三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号;Combining the three residual signals with the second preset mono signal to obtain two stereo signals;
通过立体声编码器将得到的三路立体声信号分别编入码流。The obtained three-way stereo signals are respectively coded into bit streams through a stereo encoder.
可选地,编码子模块用于:Optionally, the encoding submodule is used to:
将这三路残差信号中相关性最高的两路残差信号组合,以得到两路立体声信号中的一路 立体声信号;Combining the two most highly correlated residual signals in the three residual signals to obtain one stereo signal in the two stereo signals;
将这三路残差信号中除相关性最高的两路残差信号之外的一路残差信号与第二路预设单声道信号组合,以得到两路立体声信号中的另一路立体声信号。Combining one residual signal except for the two residual signals with the highest correlation among the three residual signals and the second preset mono signal to obtain the other stereo signal among the two stereo signals.
可选地,第一路预设单声道信号为全零信号或全一信号,全零信号包括采样点的值均为零的信号或者频点的值均为零的信号,全一信号包括采样点的值均为一的信号或者频点的值均为一的信号;第二路预设单声道信号为全零信号或全一信号;第一路预设单声道信号与第二路预设单声道信号相同或不同。Optionally, the first preset monophonic signal is an all-zero signal or an all-one signal. The all-zero signal includes a signal whose sampling point values are all zero or a signal whose frequency point values are all zero. The all-one signal includes The value of the sampling point is all one signal or the signal of the frequency point value is one; the second preset mono signal is all zero signal or all one signal; the first preset mono signal and the second the same or different preset mono signals.
可选地,编码子模块用于:Optionally, the encoding submodule is used to:
通过单声道编码器将这一路虚拟扬声器信号、以及这三路残差信号中的各路残差信号分别编入码流。The one channel of virtual loudspeaker signals and the residual signals of the three channels of residual signals are respectively coded into code streams through a mono encoder.
可选地,该装置1600还包括:Optionally, the device 1600 also includes:
第二编码模块,用于若当前帧的编码方案为第一编码方案,则按照第一编码方案将该HOA信号编入码流;The second encoding module is used to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
第三编码模块,用于若当前帧的编码方案为第二编码方案,则按照第二编码方案将该HOA信号编入码流。The third encoding module is configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
可选地,第一确定模块1601包括:Optionally, the first determining module 1601 includes:
第二确定子模块,用于根据该HOA信号确定当前帧的初始编码方案,初始编码方案为第一编码方案或第二编码方案;The second determining submodule is used to determine the initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
第三确定子模块,用于若当前帧的初始编码方案与当前帧的前一帧的初始编码方案相同,则确定当前帧的编码方案为当前帧的初始编码方案;The third determining submodule is used to determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame;
第四确定子模块,用于若当前帧的初始编码方案为第一编码方案且当前帧的前一帧的初始编码方案为第二编码方案,或当前帧的初始编码方案为第二编码方案且当前帧的前一帧的初始编码方案为第一编码方案,则确定当前帧的编码方案为第三编码方案。The fourth determining submodule is used to determine if the initial encoding scheme of the current frame is the first encoding scheme and the initial encoding scheme of the previous frame of the current frame is the second encoding scheme, or the initial encoding scheme of the current frame is the second encoding scheme and The initial encoding scheme of the frame preceding the current frame is the first encoding scheme, and then it is determined that the encoding scheme of the current frame is the third encoding scheme.
可选地,该装置1600还包括:Optionally, the device 1600 also includes:
第四编码模块,用于将当前帧的初始编码方案的指示信息编入码流。The fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
可选地,该装置1600还包括:Optionally, the device 1600 also includes:
第二确定模块,用于确定当前帧的切换标志的值,当当前帧的编码方案为第一编码方案或第二编码方案时,当前帧的切换标志的值为第一值;当当前帧的编码方案为第三编码方案时,当前帧的切换标志的值为第二值;The second determination module is used to determine the value of the switching flag of the current frame. When the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when the coding scheme of the current frame is the first value; When the encoding scheme is the third encoding scheme, the value of the switching flag of the current frame is the second value;
第五编码模块,用于将该切换标志的值编入码流。The fifth encoding module is used to encode the value of the switching flag into the code stream.
可选地,该装置1600还包括:Optionally, the device 1600 also includes:
第六编码模块,用于将当前帧的编码方案的指示信息编入码流。The sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
可选地,指定通道与第一编码方案中预设的传输通道一致。Optionally, the specified channel is consistent with the preset transmission channel in the first encoding scheme.
在本申请实施例中,结合两个方案(即基于虚拟扬声器选择的编解码方案和基于方向音频编码的编解码方案)对音频帧的HOA信号进行编解码,也即针对不同的音频帧选择合适的编解码方案,这样能够提升音频信号的压缩率。同时,为了使得在不同编解码方案之间切换时听觉质量的平滑过渡,本方案中对于某些音频帧来说,并非直接采用上述两个方案中的任一个方案进行编解码,而是采用一种新的编解码方案来编解码这些音频帧,即将这些音频帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案进行编解码,从而使得 对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。In the embodiment of the present application, two schemes (i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, to select the appropriate The codec scheme, which can improve the compression rate of the audio signal. At the same time, in order to achieve a smooth transition of auditory quality when switching between different codec schemes, for some audio frames in this scheme, either one of the above two schemes is not directly used for encoding and decoding, but a A new codec scheme is used to encode and decode these audio frames, that is, the signals of the specified channels in the HOA signals of these audio frames are encoded into the code stream, that is, a compromise scheme is used for encoding and decoding, so that the HOA recovered by decoding After the signal is rendered and played, the auditory quality can be smoothly transitioned.
需要说明的是:上述实施例提供的编码装置在编码音频帧时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的编码装置与编码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: when the encoding device provided in the above embodiment encodes an audio frame, the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the encoding device and the encoding method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
图17是本申请实施例提供的一种解码装置1700的结构示意图,该解码装置1700可以由软件、硬件或者两者的结合实现成为解码端设备的部分或者全部,该解码端设备可以为前述实施例中的任一编码端设备。参见图17,该装置1700包括:第一获得模块1701、第一确定模块1702、第二确定模块1703、第三确定模块1704和第二获得模块1705。Fig. 17 is a schematic structural diagram of a decoding device 1700 provided by the embodiment of the present application. The decoding device 1700 can be implemented by software, hardware or a combination of the two to become part or all of the decoding end device. The decoding end device can be the aforementioned implementation Any encoding device in the example. Referring to FIG. 17 , the apparatus 1700 includes: a first obtaining module 1701 , a first determining module 1702 , a second determining module 1703 , a third determining module 1704 and a second obtaining module 1705 .
第一获得模块1701,用于基于码流获得当前帧的解码方案,当前帧的解码方案为第一解码方案、第二解码方案和第三解码方案中的一种;其中,第一解码方案为基于方向音频解码的高阶立体混响HOA解码方案,第二解码方案为基于虚拟扬声器选择的HOA解码方案,第三解码方案为混合解码方案;The first obtaining module 1701 is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is A high-order stereo reverberation HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
第一确定模块1702,用于若当前帧的解码方案为第三解码方案,则基于码流确定当前帧的HOA信号中指定通道的信号,指定通道为HOA信号的所有通道中的部分通道;The first determination module 1702 is used to determine the signal of the specified channel in the HOA signal of the current frame based on the code stream if the decoding scheme of the current frame is the third decoding scheme, and the specified channel is a part of all channels of the HOA signal;
第二确定模块1703,用于基于指定通道的信号,确定HOA信号中除指定通道之外的一个或多个剩余通道的增益;The second determination module 1703 is configured to determine the gain of one or more remaining channels in the HOA signal except for the specified channel based on the signal of the specified channel;
第三确定模块1704,用于基于指定通道的信号和该一个或多个剩余通道的增益,确定该一个或多个剩余通道中各个剩余通道的信号;The third determination module 1704 is configured to determine the signal of each remaining channel in the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels;
第二获得模块1705,用于基于指定通道的信号和该一个或多个剩余通道的信号,获得当前帧的重建HOA信号。The second obtaining module 1705 is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
可选地,第一确定模块1702包括:Optionally, the first determining module 1702 includes:
第一确定子模块,用于基于码流确定虚拟扬声器信号和残差信号;A first determining submodule, configured to determine a virtual speaker signal and a residual signal based on a code stream;
第二确定子模块,用于基于该虚拟扬声器信号和残差信号,确定指定通道的信号。The second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
通过立体声解码器对码流进行解码,以得到三路立体声信号;Decode the code stream through a stereo decoder to obtain three stereo signals;
基于这三路立体声信号,确定一路虚拟扬声器信号和三路残差信号。Based on the three stereo signals, one virtual speaker signal and three residual signals are determined.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
基于这三路立体声信号中的一路立体声信号,确定一路虚拟扬声器信号;Determining a virtual loudspeaker signal based on a stereo signal of the three stereo signals;
基于这三路立体声信号中的另两路立体声信号,确定三路残差信号。Based on the other two stereo signals of the three stereo signals, three residual signals are determined.
可选地,第一确定子模块用于:Optionally, the first determination submodule is used for:
通过单声道解码器对码流进行解码,以得到一路虚拟扬声器信号和三路残差信号。The code stream is decoded by a monophonic decoder to obtain one virtual speaker signal and three residual signals.
可选地,指定通道的信号包括一阶立体混响FOA信号,FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号;Optionally, the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals;
第一确定子模块用于:The first determined submodule is used for:
基于该虚拟扬声器信号,确定W信号;determining a W signal based on the virtual loudspeaker signal;
基于该残差信号与W信号确定X信号、Y信号和Z信号,或者,基于该残差信号确定X 信号、Y信号和Z信号。The X signal, the Y signal and the Z signal are determined based on the residual signal and the W signal, or the X signal, the Y signal and the Z signal are determined based on the residual signal.
可选地,该装置1700还包括:Optionally, the device 1700 also includes:
第一解码模块,用于若当前帧的解码方案为第一解码方案,则按照第一解码方案,根据码流获得当前帧的重建HOA信号;The first decoding module is used to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
第二解码模块,用于若当前帧的解码方案为第二解码方案,则按照第二解码方案,根据码流获得当前帧的重建HOA信号。The second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
可选地,第二解码模块包括:Optionally, the second decoding module includes:
第一获得子模块,用于按照第二解码方案,根据码流获得初始HOA信号;The first obtaining submodule is used to obtain the initial HOA signal according to the code stream according to the second decoding scheme;
增益调整子模块,用于若当前帧的前一帧的解码方案为第三解码方案,则根据当前帧的前一帧的高阶增益,对初始HOA信号的高阶部分进行增益调整;The gain adjustment submodule is used to adjust the gain of the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme;
第二获得子模块,用于基于初始HOA信号的低阶部分和经增益调整的高阶部分,获得重建HOA信号。The second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
可选地,第一获得模块1701包括:Optionally, the first obtaining module 1701 includes:
第一解析子模块,用于从码流中解析出当前帧的切换标志的值;The first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
第二解析子模块,用于若该切换标志的值为第一值,则从码流中解析当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案或第二解码方案;The second parsing submodule is used to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme or second decoding scheme;
第三确定子模块,用于若该切换标志的值为第二值,确定当前帧的解码方案为第三解码方案。The third determining submodule is configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
可选地,第一获得模块1701包括:Optionally, the first obtaining module 1701 includes:
第三解析子模块,用于从码流中解析出当前帧的解码方案的指示信息,指示信息用于指示当前帧的解码方案为第一解码方案、第二解码方案或第三解码方案。The third parsing sub-module is used to parse out the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the third decoding scheme.
可选地,第一获得模块1701包括:Optionally, the first obtaining module 1701 includes:
第四解析子模块,用于从码流中解析出当前帧的初始解码方案,初始解码方案为第一解码方案或第二解码方案;The fourth parsing submodule is used to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
第四确定子模块,用于若当前帧的初始解码方案与当前帧的前一帧的初始解码方案相同,则确定当前帧的解码方案为当前帧的初始解码方案;The fourth determining submodule is used to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame;
第五确定子模块,用于若当前帧的初始解码方案为第一解码方案且当前帧的前一帧的初始解码方案为第二解码方案,或当前帧的初始解码方案为第二解码方案且当前帧的前一帧的初始解码方案为第一解码方案,则确定当前帧的解码方案为第三解码方案。The fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the second decoding scheme and The initial decoding scheme of the previous frame of the current frame is the first decoding scheme, and then it is determined that the decoding scheme of the current frame is the third decoding scheme.
在本申请实施例中,结合两个方案(即基于虚拟扬声器选择的编解码方案和基于方向音频编码的编解码方案)对音频帧的HOA信号进行编解码,也即针对不同的音频帧选择合适的编解码方案,这样能够提升音频信号的压缩率。同时,为了使得在不同编解码方案之间切换时听觉质量的平滑过渡,本方案中对于某些音频帧来说,并非直接采用上述两个方案中的任一个方案进行编解码,而是采用一种新的编解码方案来编解码这些音频帧,即将这些音频帧的HOA信号中指定通道的信号编入码流,即采用一种折中的方案进行编解码,从而使得对解码恢复出的HOA信号进行渲染播放后的听觉质量能够平滑过渡。In the embodiment of the present application, two schemes (i.e. the codec scheme based on virtual speaker selection and the codec scheme based on directional audio coding) are combined to encode and decode the HOA signal of the audio frame, that is, to select the appropriate The codec scheme, which can improve the compression rate of the audio signal. At the same time, in order to achieve a smooth transition of auditory quality when switching between different codec schemes, for some audio frames in this scheme, either one of the above two schemes is not directly used for encoding and decoding, but a A new codec scheme is used to encode and decode these audio frames, that is, the signals of the specified channels in the HOA signals of these audio frames are encoded into the code stream, that is, a compromise scheme is used for encoding and decoding, so that the HOA recovered by decoding After the signal is rendered and played, the auditory quality can be smoothly transitioned.
需要说明的是:上述实施例提供的解码装置在解码音频帧时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上 述实施例提供的解码装置与解码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the decoding device provided in the above embodiment decodes audio frames, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the decoding device and the decoding method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
图18为用于本申请实施例的一种编解码装置1800的示意性框图。其中,编解码装置1800可以包括处理器1801、存储器1802和总线***1803。其中,处理器1801和存储器1802通过总线***1803相连,该存储器1802用于存储指令,该处理器1801用于执行该存储器1802存储的指令,以执行本申请实施例描述的各种的编码或解码方法。为避免重复,这里不再详细描述。Fig. 18 is a schematic block diagram of a codec device 1800 used in an embodiment of the present application. Wherein, the codec apparatus 1800 may include a processor 1801 , a memory 1802 and a bus system 1803 . Among them, the processor 1801 and the memory 1802 are connected through the bus system 1803, the memory 1802 is used to store instructions, and the processor 1801 is used to execute the instructions stored in the memory 1802 to perform various encoding or decoding described in the embodiments of this application method. To avoid repetition, no detailed description is given here.
在本申请实施例中,该处理器1801可以是中央处理单元(central processing unit,CPU),该处理器1801还可以是其他通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。In the embodiment of the present application, the processor 1801 can be a central processing unit (central processing unit, CPU), and the processor 1801 can also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
该存储器1802可以包括ROM设备或者RAM设备。任何其他适宜类型的存储设备也可以用作存储器1802。存储器1802可以包括由处理器1801使用总线1803访问的代码和数据18021。存储器1802可以进一步包括操作***18023和应用程序18022,该应用程序18022包括允许处理器1801执行本申请实施例描述的编码或解码方法的至少一个程序。例如,应用程序18022可以包括应用1至N,其进一步包括执行在本申请实施例描述的编码或解码方法的编码或解码应用(简称编解码应用)。The memory 1802 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as memory 1802 . Memory 1802 may include code and data 18021 accessed by processor 1801 using bus 1803 . The memory 1802 may further include an operating system 18023 and an application program 18022, where the application program 18022 includes at least one program that allows the processor 1801 to execute the encoding or decoding method described in the embodiment of this application. For example, the application program 18022 may include applications 1 to N, which further include an encoding or decoding application (codec application for short) that executes the encoding or decoding method described in the embodiment of this application.
该总线***1803除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线***1803。The bus system 1803 may include not only a data bus, but also a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 1803 in the figure.
可选地,编解码装置1800还可以包括一个或多个输出设备,诸如显示器1804。在一个示例中,显示器1804可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1804可以经由总线1803连接到处理器1801。Optionally, the codec apparatus 1800 may also include one or more output devices, such as a display 1804 . In one example, display 1804 may be a touch-sensitive display that incorporates a display with a haptic unit operable to sense touch input. The display 1804 may be connected to the processor 1801 via the bus 1803 .
需要指出的是,编解码装置1800可以执行本申请实施例中的编码方法,也可执行本申请实施例中的解码方法。It should be noted that the codec device 1800 may implement the encoding method in the embodiment of the present application, and may also implement the decoding method in the embodiment of the present application.
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,基于通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。Those of skill in the art would appreciate that the functions described in conjunction with the various illustrative logical blocks, modules, and algorithm steps disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by the various illustrative logical blocks, modules, and steps may be stored or transmitted as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which correspond to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, based on a communication protocol) . In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this application. A computer program product may include a computer readable medium.
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么 同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、DVD和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, flash memory, or any other medium that can contain the desired program code in the form of a computer and can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, DVD and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。在一种示例下,编码器100及解码器200中的各种说明性逻辑框、单元、模块可以理解为对应的电路器件或逻辑元件。can be processed by one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. device to execute instructions. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in conjunction with into the combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements. In one example, various illustrative logical blocks, units, and modules in the encoder 100 and the decoder 200 may be understood as corresponding circuit devices or logic elements.
本申请实施例的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请实施例中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。The techniques of embodiments of the present application may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset). Various components, modules or units are described in the embodiments of the present application to emphasize the functional aspects of the apparatus for performing the disclosed technology, but they do not necessarily need to be realized by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (comprising one or more processors as described above) to supply.
也就是说,在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))或半导体介质(例如:固态硬盘(solid state disk,SSD))等。值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。That is to say, in the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: digital versatile disc (digital versatile disc, DVD)) or a semiconductor medium (for example: solid state disk (solid state disk, SSD)) wait. It should be noted that the computer-readable storage medium mentioned in the embodiment of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.
应当理解的是,本文提及的“至少一个”是指一个或多个,“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对 功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。It should be understood that "at least one" mentioned herein means one or more, and "multiple" means two or more. In the description of the embodiments of this application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; "and/or" in this article is only a description of the association of associated objects A relationship means that there may be three kinds of relationships, for example, A and/or B means: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not necessarily limit the difference.
需要说明的是,本申请实施例所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It should be noted that the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.) and All signals are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above-mentioned embodiments provided by the application are not intended to limit the application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the application shall be included in the protection scope of the application. Inside.

Claims (54)

  1. 一种编码方法,其特征在于,所述方法包括:An encoding method, characterized in that the method comprises:
    根据当前帧的高阶立体混响HOA信号确定所述当前帧的编码方案,所述当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种;其中,所述第一编码方案为基于方向音频编码的HOA编码方案,所述第二编码方案为基于虚拟扬声器选择的HOA编码方案,所述第三编码方案为混合编码方案;Determine the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame, and the coding scheme of the current frame is one of the first coding scheme, the second coding scheme and the third coding scheme; wherein, the The first coding scheme is an HOA coding scheme based on directional audio coding, the second coding scheme is an HOA coding scheme based on virtual speaker selection, and the third coding scheme is a hybrid coding scheme;
    若所述当前帧的编码方案为所述第三编码方案,则将所述HOA信号中指定通道的信号编入码流,所述指定通道为所述HOA信号的所有通道中的部分通道。If the encoding scheme of the current frame is the third encoding scheme, encode signals of a specified channel in the HOA signal into a code stream, and the specified channel is a part of all channels of the HOA signal.
  2. 如权利要求1所述的方法,其特征在于,所述指定通道的信号包括一阶立体混响FOA信号,所述FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。The method according to claim 1, wherein the signal of the designated channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals .
  3. 如权利要求2所述的方法,其特征在于,所述将所述HOA信号中指定通道的信号编入码流,包括:The method according to claim 2, wherein said encoding the signal of the specified channel in the HOA signal into the code stream comprises:
    基于所述W信号、所述X信号、所述Y信号和所述Z信号,确定虚拟扬声器信号和残差信号;determining a virtual loudspeaker signal and a residual signal based on the W signal, the X signal, the Y signal, and the Z signal;
    将所述虚拟扬声器信号和所述残差信号编入所述码流。Encoding the virtual speaker signal and the residual signal into the code stream.
  4. 如权利要求3所述的方法,其特征在于,所述基于所述W信号、所述X信号、所述Y信号和所述Z信号,确定虚拟扬声器信号和残差信号,包括:The method according to claim 3, wherein said determining a virtual loudspeaker signal and a residual signal based on said W signal, said X signal, said Y signal, and said Z signal comprises:
    将所述W信号确定为一路所述虚拟扬声器信号;determining the W signal as one of the virtual speaker signals;
    基于所述W信号、X信号、所述Y信号和所述Z信号确定三路所述残差信号,或者,将所述X信号、所述Y信号和所述Z信号确定为三路所述残差信号。Determine the three residual signals based on the W signal, the X signal, the Y signal and the Z signal, or determine the X signal, the Y signal and the Z signal as the three residual signal.
  5. 如权利要求4所述的方法,其特征在于,所述将所述虚拟扬声器信号和所述残差信号编入所述码流,包括:The method according to claim 4, wherein said encoding said virtual loudspeaker signal and said residual signal into said code stream comprises:
    将所述一路虚拟扬声器信号与第一路预设单声道信号组合,以得到一路立体声信号;combining the one virtual speaker signal with the first preset mono signal to obtain one stereo signal;
    将所述三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号;combining the three residual signals with the second preset mono signal to obtain two stereo signals;
    通过立体声编码器将得到的三路立体声信号分别编入所述码流。The obtained three stereo signals are respectively encoded into the code stream through a stereo encoder.
  6. 如权利要求5所述的方法,其特征在于,所述将所述三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号,包括:The method according to claim 5, wherein said combining the three-way residual signal with the second-way preset monophonic signal to obtain two-way stereo signals comprises:
    将所述三路残差信号中相关性最高的两路残差信号组合,以得到所述两路立体声信号中的一路立体声信号;combining the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals;
    将所述三路残差信号中除所述相关性最高的两路残差信号之外的一路残差信号与所述第二路预设单声道信号组合,以得到所述两路立体声信号中的另一路立体声信号。Combining one residual signal of the three residual signals except the two residual signals with the highest correlation with the second preset mono signal to obtain the two stereo signals Another stereo signal in .
  7. 如权利要求5或6所述的方法,其特征在于,所述第一路预设单声道信号为全零信号或全一信号,所述全零信号包括采样点的值均为零的信号或者频点的值均为零的信号,所述全一信号包括采样点的值均为一的信号或者频点的值均为一的信号;The method according to claim 5 or 6, wherein the first preset monophonic signal is an all-zero signal or an all-one signal, and the all-zero signal includes a signal whose sampling points are all zero Or a signal whose frequency point values are all zero, and the all-one signal includes a signal whose sampling point values are all one or a frequency point value all one;
    所述第二路预设单声道信号为全零信号或全一信号;The second preset mono signal is an all-zero signal or an all-one signal;
    所述第一路预设单声道信号与所述第二路预设单声道信号相同或不同。The first preset mono signal is the same as or different from the second preset mono signal.
  8. 如权利要求4所述的方法,其特征在于,所述将所述虚拟扬声器信号和所述残差信号编入所述码流,包括:The method according to claim 4, wherein said encoding said virtual loudspeaker signal and said residual signal into said code stream comprises:
    通过单声道编码器将所述一路虚拟扬声器信号、以及所述三路残差信号中的各路残差信号分别编入所述码流。Encoding the one virtual loudspeaker signal and each residual signal of the three residual signals into the code stream respectively through a mono encoder.
  9. 如权利要求1-8任一所述的方法,其特征在于,所述根据当前帧的高阶立体混响HOA信号确定所述当前帧的编码方案之后,还包括:The method according to any one of claims 1-8, wherein after determining the encoding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame, further comprising:
    若所述当前帧的编码方案为所述第一编码方案,则按照所述第一编码方案将所述HOA信号编入所述码流;If the encoding scheme of the current frame is the first encoding scheme, encoding the HOA signal into the code stream according to the first encoding scheme;
    若所述当前帧的编码方案为所述第二编码方案,则按照所述第二编码方案将所述HOA信号编入所述码流。If the coding scheme of the current frame is the second coding scheme, encode the HOA signal into the code stream according to the second coding scheme.
  10. 如权利要求1-9任一所述的方法,其特征在于,所述根据当前帧的高阶立体混响HOA信号确定所述当前帧的编码方案,包括:The method according to any one of claims 1-9, wherein the determining the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame includes:
    根据所述HOA信号确定所述当前帧的初始编码方案,所述初始编码方案为所述第一编码方案或所述第二编码方案;determining an initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
    若所述当前帧的初始编码方案与所述当前帧的前一帧的初始编码方案相同,则确定所述当前帧的编码方案为所述当前帧的初始编码方案;If the initial coding scheme of the current frame is the same as the initial coding scheme of the previous frame of the current frame, then determine that the coding scheme of the current frame is the initial coding scheme of the current frame;
    若所述当前帧的初始编码方案为所述第一编码方案且所述当前帧的前一帧的初始编码方案为所述第二编码方案,或所述当前帧的初始编码方案为所述第二编码方案且所述当前帧的前一帧的初始编码方案为所述第一编码方案,则确定所述当前帧的编码方案为所述第三编码方案。If the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the initial coding scheme of the current frame is the first coding scheme Two coding schemes and the initial coding scheme of the previous frame of the current frame is the first coding scheme, then it is determined that the coding scheme of the current frame is the third coding scheme.
  11. 如权利要求10所述的方法,其特征在于,所述根据所述HOA信号确定所述当前帧的初始编码方案之后,还包括:The method according to claim 10, wherein after determining the initial coding scheme of the current frame according to the HOA signal, further comprising:
    将所述当前帧的初始编码方案的指示信息编入所述码流。Encoding the indication information of the initial coding scheme of the current frame into the code stream.
  12. 如权利要求1-11任一所述的方法,其特征在于,所述根据当前帧的高阶立体混响HOA信号确定所述当前帧的编码方案之后,还包括:The method according to any one of claims 1-11, wherein after determining the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame, further comprising:
    确定所述当前帧的切换标志的值,当所述当前帧的编码方案为所述第一编码方案或所述第二编码方案时,所述当前帧的切换标志的值为第一值;当所述当前帧的编码方案为所述第三编码方案时,所述当前帧的切换标志的值为第二值;Determine the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame is the first value; when When the encoding scheme of the current frame is the third encoding scheme, the value of the switching flag of the current frame is a second value;
    将所述切换标志的值编入所述码流。Encoding the value of the switching flag into the code stream.
  13. 如权利要求1-10任一所述的方法,其特征在于,所述根据当前帧的HOA信号确定所述当前帧的编码方案之后,还包括:The method according to any one of claims 1-10, wherein after determining the encoding scheme of the current frame according to the HOA signal of the current frame, further comprising:
    将所述当前帧的编码方案的指示信息编入所述码流。Encoding the indication information of the coding scheme of the current frame into the code stream.
  14. 如权利要求1-13任一所述的方法,其特征在于,所述指定通道与所述第一编码方案中预设的传输通道一致。The method according to any one of claims 1-13, wherein the designated channel is consistent with a preset transmission channel in the first encoding scheme.
  15. 一种解码方法,其特征在于,所述方法包括:A decoding method, characterized in that the method comprises:
    基于码流获得当前帧的解码方案,所述当前帧的解码方案为第一解码方案、第二解码方案和第三解码方案中的一种;其中,所述第一解码方案为基于方向音频解码的高阶立体混响HOA解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案;The decoding scheme of the current frame is obtained based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first decoding scheme is based on direction audio decoding The high-order ambisonic reverberation HOA decoding scheme, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
    若所述当前帧的解码方案为所述第三解码方案,则基于所述码流确定所述当前帧的HOA信号中指定通道的信号,所述指定通道为所述HOA信号的所有通道中的部分通道;If the decoding scheme of the current frame is the third decoding scheme, determine the signal of a specified channel in the HOA signal of the current frame based on the code stream, and the specified channel is a signal of all channels of the HOA signal part of the channel;
    基于所述指定通道的信号,确定所述HOA信号中除所述指定通道之外的一个或多个剩余通道的增益;determining the gain of one or more remaining channels in the HOA signal other than the designated channel based on the signal of the designated channel;
    基于所述指定通道的信号和所述一个或多个剩余通道的增益,确定所述一个或多个剩余通道中各个剩余通道的信号;determining a signal of each remaining channel of the one or more remaining channels based on the signal of the designated channel and the gain of the one or more remaining channels;
    基于所述指定通道的信号和所述一个或多个剩余通道的信号,获得所述当前帧的重建HOA信号。Obtaining the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  16. 如权利要求15所述的方法,其特征在于,所述基于所述码流确定所述当前帧的HOA信号中指定通道的信号,包括:The method according to claim 15, wherein said determining the signal of a specified channel in the HOA signal of the current frame based on the code stream comprises:
    基于所述码流确定虚拟扬声器信号和残差信号;determining a virtual speaker signal and a residual signal based on the code stream;
    基于所述虚拟扬声器信号和所述残差信号,确定所述指定通道的信号。Based on the virtual loudspeaker signal and the residual signal, the signal of the designated channel is determined.
  17. 如权利要求16所述的方法,其特征在于,所述基于所述码流确定虚拟扬声器信号和残差信号,包括:The method according to claim 16, wherein said determining a virtual loudspeaker signal and a residual signal based on said code stream comprises:
    通过立体声解码器对所述码流进行解码,以得到三路立体声信号;Decoding the code stream by a stereo decoder to obtain three stereo signals;
    基于所述三路立体声信号,确定一路所述虚拟扬声器信号和三路所述残差信号。Based on the three stereo signals, one path of the virtual speaker signal and three paths of the residual signal are determined.
  18. 如权利要求17所述的方法,其特征在于,所述基于所述三路立体声信号,确定一路所述虚拟扬声器信号和三路所述残差信号,包括:The method according to claim 17, wherein said determining one path of said virtual speaker signal and three paths of said residual signal based on said three paths of stereo signals comprises:
    基于所述三路立体声信号中的一路立体声信号,确定所述一路虚拟扬声器信号;determining the one virtual speaker signal based on one of the three stereo signals;
    基于所述三路立体声信号中的另两路立体声信号,确定所述三路残差信号。Based on the other two stereo signals of the three stereo signals, the three residual signals are determined.
  19. 如权利要求16所述的方法,其特征在于,所述基于所述码流确定虚拟扬声器信号和残差信号,包括:The method according to claim 16, wherein said determining a virtual loudspeaker signal and a residual signal based on said code stream comprises:
    通过单声道解码器对所述码流进行解码,以得到一路所述虚拟扬声器信号和三路所述残差信号。The code stream is decoded by a monophonic decoder to obtain one channel of the virtual speaker signal and three channels of the residual signal.
  20. 如权利要求16-19任一所述的方法,其特征在于,所述指定通道的信号包括一阶立体混响FOA信号,所述FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号;The method according to any one of claims 16-19, wherein the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and a directional X signal, Y Signal and Z signal;
    所述基于所述虚拟扬声器信号和所述残差信号,确定所述指定通道的信号,包括:The determining the signal of the specified channel based on the virtual speaker signal and the residual signal includes:
    基于所述虚拟扬声器信号,确定所述W信号;determining the W signal based on the virtual speaker signal;
    基于所述残差信号与所述W信号确定所述X信号、所述Y信号和所述Z信号,或者,基于所述残差信号确定所述X信号、所述Y信号和所述Z信号。determining the X signal, the Y signal and the Z signal based on the residual signal and the W signal, or determining the X signal, the Y signal and the Z signal based on the residual signal .
  21. 如权利要求15-20任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15-20, further comprising:
    若所述当前帧的解码方案为所述第一解码方案,则按照所述第一解码方案,根据所述码流获得所述当前帧的重建HOA信号;If the decoding scheme of the current frame is the first decoding scheme, according to the first decoding scheme, obtain the reconstructed HOA signal of the current frame according to the code stream;
    若所述当前帧的解码方案为所述第二解码方案,则按照所述第二解码方案,根据所述码流获得所述当前帧的重建HOA信号。If the decoding scheme of the current frame is the second decoding scheme, obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme.
  22. 如权利要求21所述的方法,其特征在于,所述按照所述第二解码方案,根据所述码流获得所述当前帧的重建HOA信号,包括:The method according to claim 21, wherein the obtaining the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme comprises:
    按照所述第二解码方案,根据所述码流获得初始HOA信号;Obtain an initial HOA signal according to the code stream according to the second decoding scheme;
    若所述当前帧的前一帧的解码方案为所述第三解码方案,则根据所述当前帧的前一帧的高阶增益,对所述初始HOA信号的高阶部分进行增益调整;If the decoding scheme of the previous frame of the current frame is the third decoding scheme, performing gain adjustment on the high-order part of the initial HOA signal according to the high-order gain of the previous frame of the current frame;
    基于所述初始HOA信号的低阶部分和经增益调整的高阶部分,获得所述重建HOA信号。The reconstructed HOA signal is obtained based on the low order portion and the gain adjusted high order portion of the original HOA signal.
  23. 如权利要求15-22任一所述的方法,其特征在于,所述基于码流获得当前帧的解码方案,包括:The method according to any one of claims 15-22, wherein said obtaining the decoding scheme of the current frame based on the code stream comprises:
    从所述码流中解析出所述当前帧的切换标志的值;Analyzing the value of the switching flag of the current frame from the code stream;
    若所述切换标志的值为第一值,则从所述码流中解析所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案或所述第二解码方案;If the value of the switching flag is the first value, then parse the indication information of the decoding scheme of the current frame from the code stream, and the indication information is used to indicate that the decoding scheme of the current frame is the first a decoding scheme or said second decoding scheme;
    若所述切换标志的值为第二值,确定所述当前帧的解码方案为所述第三解码方案。If the value of the switching flag is the second value, determine that the decoding scheme of the current frame is the third decoding scheme.
  24. 如权利要求15-22任一所述的方法,其特征在于,所述基于码流获得当前帧的解码方案,包括:The method according to any one of claims 15-22, wherein said obtaining the decoding scheme of the current frame based on the code stream comprises:
    从所述码流中解析出所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案、所述第二解码方案或所述第三解码方案。Parsing the indication information of the decoding scheme of the current frame from the code stream, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the second decoding scheme or the Third decoding scheme.
  25. 如权利要求15-22任一所述的方法,其特征在于,所述基于码流获得当前帧的解码方案,包括:The method according to any one of claims 15-22, wherein said obtaining the decoding scheme of the current frame based on the code stream comprises:
    从所述码流中解析出所述当前帧的初始解码方案,所述初始解码方案为所述第一解码方案或所述第二解码方案;Analyzing the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
    若所述当前帧的初始解码方案与所述当前帧的前一帧的初始解码方案相同,则确定所述当前帧的解码方案为所述当前帧的初始解码方案;If the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame, then determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame;
    若所述当前帧的初始解码方案为所述第一解码方案且所述当前帧的前一帧的初始解码方案为所述第二解码方案,或所述当前帧的初始解码方案为所述第二解码方案且所述当前帧的前一帧的初始解码方案为所述第一解码方案,则确定所述当前帧的解码方案为所述第三解码方案。If the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the initial decoding scheme of the current frame is the first decoding scheme Two decoding schemes and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then it is determined that the decoding scheme of the current frame is the third decoding scheme.
  26. 一种编码装置,其特征在于,所述装置包括:An encoding device, characterized in that the device comprises:
    第一确定模块,用于根据当前帧的高阶立体混响HOA信号确定所述当前帧的编码方案,所述当前帧的编码方案为第一编码方案、第二编码方案和第三编码方案中的一种;其中,所述第一编码方案为基于方向音频编码的HOA编码方案,所述第二编码方案为基于虚拟扬声器选择的HOA编码方案,所述第三编码方案为混合编码方案;The first determining module is used to determine the coding scheme of the current frame according to the high-order ambisonic reverberation HOA signal of the current frame, and the coding scheme of the current frame is the first coding scheme, the second coding scheme and the third coding scheme A kind of; wherein, the first encoding scheme is an HOA encoding scheme based on directional audio coding, the second encoding scheme is an HOA encoding scheme based on virtual speaker selection, and the third encoding scheme is a hybrid encoding scheme;
    第一编码模块,用于若所述当前帧的编码方案为所述第三编码方案,则将所述HOA信号中指定通道的信号编入码流,所述指定通道为所述HOA信号的所有通道中的部分通道。The first encoding module is configured to encode the signal of a specified channel in the HOA signal into a code stream if the encoding scheme of the current frame is the third encoding scheme, and the specified channel is all of the HOA signal Part of the channel.
  27. 如权利要求26所述的装置,其特征在于,所述指定通道的信号包括一阶立体混响FOA信号,所述FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号。The device according to claim 26, wherein the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and directional X signals, Y signals, and Z signals .
  28. 如权利要求27所述的装置,其特征在于,所述第一编码模块包括:The device according to claim 27, wherein the first encoding module comprises:
    第一确定子模块,用于基于所述W信号、所述X信号、所述Y信号和所述Z信号,确定虚拟扬声器信号和残差信号;A first determining submodule, configured to determine a virtual speaker signal and a residual signal based on the W signal, the X signal, the Y signal, and the Z signal;
    编码子模块,用于将所述虚拟扬声器信号和所述残差信号编入所述码流。An encoding submodule, configured to encode the virtual speaker signal and the residual signal into the code stream.
  29. 如权利要求28所述的装置,其特征在于,所述第一确定子模块用于:The device according to claim 28, wherein the first determining submodule is used for:
    将所述W信号确定为一路所述虚拟扬声器信号;determining the W signal as one of the virtual speaker signals;
    将所述X信号、所述Y信号和所述Z信号分别与所述W信号之间的差信号确定为三路所述残差信号,或者,将所述X信号、所述Y信号和所述Z信号确定为三路所述残差信号。Determining the difference signals between the X signal, the Y signal, and the Z signal and the W signal respectively as the three residual signals, or, the X signal, the Y signal, and the The Z signal is determined as the three residual signals.
  30. 如权利要求29所述的装置,其特征在于,所述编码子模块用于:The device according to claim 29, wherein the encoding submodule is used for:
    将所述一路虚拟扬声器信号与第一路预设单声道信号组合,以得到一路立体声信号;combining the one virtual speaker signal with the first preset mono signal to obtain one stereo signal;
    将所述三路残差信号与第二路预设单声道信号组合,以得到两路立体声信号;combining the three residual signals with the second preset mono signal to obtain two stereo signals;
    通过立体声编码器将得到的三路立体声信号分别编入所述码流。The obtained three stereo signals are respectively encoded into the code stream through a stereo encoder.
  31. 如权利要求30所述的装置,其特征在于,所述编码子模块用于:The device according to claim 30, wherein the encoding submodule is used for:
    将所述三路残差信号中相关性最高的两路残差信号组合,以得到所述两路立体声信号中的一路立体声信号;combining the two most correlated residual signals among the three residual signals to obtain one stereo signal among the two stereo signals;
    将所述三路残差信号中除所述相关性最高的两路残差信号之外的一路残差信号与所述第二路预设单声道信号组合,以得到所述两路立体声信号中的另一路立体声信号。Combining one residual signal of the three residual signals except the two residual signals with the highest correlation with the second preset mono signal to obtain the two stereo signals Another stereo signal in .
  32. 如权利要求30或31所述的装置,其特征在于,所述第一路预设单声道信号为全零信号或全一信号,所述全零信号包括采样点的值均为零的信号或者频点的值均为零的信号,所述全一信号包括采样点的值均为一的信号或者频点的值均为一的信号;The device according to claim 30 or 31, wherein the first preset monophonic signal is an all-zero signal or an all-one signal, and the all-zero signal includes a signal whose sampling points are all zero Or a signal whose frequency point values are all zero, and the all-one signal includes a signal whose sampling point values are all one or a frequency point value all one;
    所述第二路预设单声道信号为全零信号或全一信号;The second preset mono signal is an all-zero signal or an all-one signal;
    所述第一路预设单声道信号与所述第二路预设单声道信号相同或不同。The first preset mono signal is the same as or different from the second preset mono signal.
  33. 如权利要求29所述的装置,其特征在于,所述编码子模块用于:The device according to claim 29, wherein the encoding submodule is used for:
    通过单声道编码器将所述一路虚拟扬声器信号、以及所述三路残差信号中的各路残差信号分别编入所述码流。Encoding the one virtual loudspeaker signal and each residual signal of the three residual signals into the code stream respectively through a mono encoder.
  34. 如权利要求26-33任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 26-33, wherein the device further comprises:
    第二编码模块,用于若所述当前帧的编码方案为所述第一编码方案,则按照所述第一编码方案将所述HOA信号编入所述码流;A second encoding module, configured to encode the HOA signal into the code stream according to the first encoding scheme if the encoding scheme of the current frame is the first encoding scheme;
    第三编码模块,用于若所述当前帧的编码方案为所述第二编码方案,则按照所述第二编码方案将所述HOA信号编入所述码流。A third encoding module, configured to encode the HOA signal into the code stream according to the second encoding scheme if the encoding scheme of the current frame is the second encoding scheme.
  35. 如权利要求26-34任一所述的装置,其特征在于,所述第一确定模块包括:The device according to any one of claims 26-34, wherein the first determining module comprises:
    第二确定子模块,用于根据所述HOA信号确定所述当前帧的初始编码方案,所述初始编码方案为所述第一编码方案或所述第二编码方案;A second determining submodule, configured to determine an initial encoding scheme of the current frame according to the HOA signal, where the initial encoding scheme is the first encoding scheme or the second encoding scheme;
    第三确定子模块,用于若所述当前帧的初始编码方案与所述当前帧的前一帧的初始编码方案相同,则确定所述当前帧的编码方案为所述当前帧的初始编码方案;A third determining submodule, configured to determine that the encoding scheme of the current frame is the initial encoding scheme of the current frame if the initial encoding scheme of the current frame is the same as the initial encoding scheme of the previous frame of the current frame ;
    第四确定子模块,用于若所述当前帧的初始编码方案为所述第一编码方案且所述当前帧的前一帧的初始编码方案为所述第二编码方案,或所述当前帧的初始编码方案为所述第二编码方案且所述当前帧的前一帧的初始编码方案为所述第一编码方案,则确定所述当前帧的编码方案为所述第三编码方案。The fourth determining submodule is used to determine if the initial coding scheme of the current frame is the first coding scheme and the initial coding scheme of the previous frame of the current frame is the second coding scheme, or the current frame If the initial coding scheme of the current frame is the second coding scheme and the initial coding scheme of the frame preceding the current frame is the first coding scheme, then it is determined that the coding scheme of the current frame is the third coding scheme.
  36. 如权利要求35所述的装置,其特征在于,所述装置还包括:The apparatus of claim 35, further comprising:
    第四编码模块,用于将所述当前帧的初始编码方案的指示信息编入所述码流。The fourth encoding module is configured to encode the indication information of the initial encoding scheme of the current frame into the code stream.
  37. 如权利要求26-36任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 26-36, wherein the device further comprises:
    第二确定模块,用于确定所述当前帧的切换标志的值,当所述当前帧的编码方案为所述第一编码方案或所述第二编码方案时,所述当前帧的切换标志的值为第一值;当所述当前帧的编码方案为所述第三编码方案时,所述当前帧的切换标志的值为第二值;The second determination module is configured to determine the value of the switching flag of the current frame, when the coding scheme of the current frame is the first coding scheme or the second coding scheme, the value of the switching flag of the current frame The value is the first value; when the encoding scheme of the current frame is the third encoding scheme, the value of the switching flag of the current frame is the second value;
    第五编码模块,用于将所述切换标志的值编入所述码流。The fifth encoding module is configured to encode the value of the switching flag into the code stream.
  38. 如权利要求26-35任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 26-35, wherein the device further comprises:
    第六编码模块,用于将所述当前帧的编码方案的指示信息编入所述码流。The sixth encoding module is configured to encode the indication information of the encoding scheme of the current frame into the code stream.
  39. 如权利要求26-38任一所述的装置,其特征在于,所述指定通道与所述第一编码方案 中预设的传输通道一致。The device according to any one of claims 26-38, wherein the specified channel is consistent with the preset transmission channel in the first coding scheme.
  40. 一种解码装置,其特征在于,所述装置包括:A decoding device, characterized in that the device comprises:
    第一获得模块,用于基于码流获得当前帧的解码方案,所述当前帧的解码方案为第一解码方案、第二解码方案和第三解码方案中的一种;其中,所述第一解码方案为基于方向音频解码的高阶立体混响HOA解码方案,所述第二解码方案为基于虚拟扬声器选择的HOA解码方案,所述第三解码方案为混合解码方案;The first obtaining module is used to obtain the decoding scheme of the current frame based on the code stream, and the decoding scheme of the current frame is one of the first decoding scheme, the second decoding scheme and the third decoding scheme; wherein, the first The decoding scheme is a high-order ambisonics HOA decoding scheme based on directional audio decoding, the second decoding scheme is an HOA decoding scheme based on virtual speaker selection, and the third decoding scheme is a hybrid decoding scheme;
    第一确定模块,用于若所述当前帧的解码方案为所述第三解码方案,则基于所述码流确定所述当前帧的HOA信号中指定通道的信号,所述指定通道为所述HOA信号的所有通道中的部分通道;The first determining module is configured to determine, based on the code stream, a signal of a designated channel in the HOA signal of the current frame if the decoding scheme of the current frame is the third decoding scheme, and the designated channel is the some of the channels of the HOA signal;
    第二确定模块,用于基于所述指定通道的信号,确定所述HOA信号中除所述指定通道之外的一个或多个剩余通道的增益;A second determination module, configured to determine the gain of one or more remaining channels in the HOA signal except the specified channel based on the signal of the specified channel;
    第三确定模块,用于基于所述指定通道的信号和所述一个或多个剩余通道的增益,确定所述一个或多个剩余通道中各个剩余通道的信号;A third determining module, configured to determine a signal of each of the one or more remaining channels based on the signal of the specified channel and the gain of the one or more remaining channels;
    第二获得模块,用于基于所述指定通道的信号和所述一个或多个剩余通道的信号,获得所述当前帧的重建HOA信号。The second obtaining module is configured to obtain the reconstructed HOA signal of the current frame based on the signal of the designated channel and the signals of the one or more remaining channels.
  41. 如权利要求40所述的装置,其特征在于,所述第一确定模块包括:The device according to claim 40, wherein the first determining module comprises:
    第一确定子模块,用于基于所述码流确定虚拟扬声器信号和残差信号;A first determining submodule, configured to determine a virtual speaker signal and a residual signal based on the code stream;
    第二确定子模块,用于基于所述虚拟扬声器信号和所述残差信号,确定所述指定通道的信号。The second determining submodule is configured to determine the signal of the specified channel based on the virtual speaker signal and the residual signal.
  42. 如权利要求41所述的装置,其特征在于,所述第一确定子模块用于:The device according to claim 41, wherein the first determining submodule is used for:
    通过立体声解码器对所述码流进行解码,以得到三路立体声信号;Decoding the code stream by a stereo decoder to obtain three stereo signals;
    基于所述三路立体声信号,确定一路所述虚拟扬声器信号和三路所述残差信号。Based on the three stereo signals, one path of the virtual speaker signal and three paths of the residual signal are determined.
  43. 如权利要求42所述的装置,其特征在于,所述第一确定子模块用于:The device according to claim 42, wherein the first determining submodule is used for:
    基于所述三路立体声信号中的一路立体声信号,确定所述一路虚拟扬声器信号;determining the one virtual speaker signal based on one of the three stereo signals;
    基于所述三路立体声信号中的另两路立体声信号,确定所述三路残差信号。Based on the other two stereo signals of the three stereo signals, the three residual signals are determined.
  44. 如权利要求41所述的装置,其特征在于,所述第一确定子模块用于:The device according to claim 41, wherein the first determining submodule is used for:
    通过单声道解码器对所述码流进行解码,以得到一路所述虚拟扬声器信号和三路所述残差信号。The code stream is decoded by a monophonic decoder to obtain one channel of the virtual speaker signal and three channels of the residual signal.
  45. 如权利要求41-44任一所述的装置,其特征在于,所述指定通道的信号包括一阶立体混响FOA信号,所述FOA信号包括全向的W信号,以及定向的X信号、Y信号和Z信号;The device according to any one of claims 41-44, wherein the signal of the specified channel includes a first-order ambisonic reverberation FOA signal, and the FOA signal includes an omnidirectional W signal, and a directional X signal, Y Signal and Z signal;
    所述第一确定子模块用于:The first determining submodule is used for:
    基于所述虚拟扬声器信号,确定所述W信号;determining the W signal based on the virtual speaker signal;
    基于所述残差信号与所述W信号确定所述X信号、所述Y信号和所述Z信号,或者, 基于所述残差信号确定所述X信号、所述Y信号和所述Z信号。determining the X signal, the Y signal and the Z signal based on the residual signal and the W signal, or determining the X signal, the Y signal and the Z signal based on the residual signal .
  46. 如权利要求40-45任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 40-45, wherein the device further comprises:
    第一解码模块,用于若所述当前帧的解码方案为所述第一解码方案,则按照所述第一解码方案,根据所述码流获得所述当前帧的重建HOA信号;The first decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the first decoding scheme if the decoding scheme of the current frame is the first decoding scheme;
    第二解码模块,用于若所述当前帧的解码方案为所述第二解码方案,则按照所述第二解码方案,根据所述码流获得所述当前帧的重建HOA信号。The second decoding module is configured to obtain the reconstructed HOA signal of the current frame according to the code stream according to the second decoding scheme if the decoding scheme of the current frame is the second decoding scheme.
  47. 如权利要求46所述的装置,其特征在于,所述第二解码模块包括:The device according to claim 46, wherein the second decoding module comprises:
    第一获得子模块,用于按照所述第二解码方案,根据所述码流获得初始HOA信号;A first obtaining submodule, configured to obtain an initial HOA signal according to the code stream according to the second decoding scheme;
    增益调整子模块,用于若所述当前帧的前一帧的解码方案为所述第三解码方案,则根据所述当前帧的前一帧的高阶增益,对所述初始HOA信号的高阶部分进行增益调整;The gain adjustment submodule is used to adjust the high-order gain of the initial HOA signal according to the high-order gain of the previous frame of the current frame if the decoding scheme of the previous frame of the current frame is the third decoding scheme. Adjust the gain of the stage part;
    第二获得子模块,用于基于所述初始HOA信号的低阶部分和经增益调整的高阶部分,获得所述重建HOA信号。The second obtaining sub-module is used to obtain the reconstructed HOA signal based on the low-order part and the gain-adjusted high-order part of the original HOA signal.
  48. 如权利要求40-47任一所述的装置,其特征在于,所述第一获得模块包括:The device according to any one of claims 40-47, wherein the first obtaining module comprises:
    第一解析子模块,用于从所述码流中解析出所述当前帧的切换标志的值;The first parsing submodule is used to parse out the value of the switching flag of the current frame from the code stream;
    第二解析子模块,用于若所述切换标志的值为第一值,则从所述码流中解析所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案或所述第二解码方案;The second parsing submodule is configured to parse the indication information of the decoding scheme of the current frame from the code stream if the value of the switching flag is the first value, and the indication information is used to indicate the current frame The decoding scheme is the first decoding scheme or the second decoding scheme;
    第三确定子模块,用于若所述切换标志的值为第二值,确定所述当前帧的解码方案为所述第三解码方案。A third determining submodule, configured to determine that the decoding scheme of the current frame is the third decoding scheme if the value of the switching flag is the second value.
  49. 如权利要求40-47任一所述的装置,其特征在于,所述第一获得模块包括:The device according to any one of claims 40-47, wherein the first obtaining module comprises:
    第三解析子模块,用于从所述码流中解析出所述当前帧的解码方案的指示信息,所述指示信息用于指示所述当前帧的解码方案为所述第一解码方案、所述第二解码方案或所述第三解码方案。The third parsing submodule is configured to parse out the indication information of the decoding scheme of the current frame from the code stream, the indication information is used to indicate that the decoding scheme of the current frame is the first decoding scheme, the The second decoding scheme or the third decoding scheme.
  50. 如权利要求40-47任一所述的装置,其特征在于,所述第一获得模块包括:The device according to any one of claims 40-47, wherein the first obtaining module comprises:
    第四解析子模块,用于从所述码流中解析出所述当前帧的初始解码方案,所述初始解码方案为所述第一解码方案或所述第二解码方案;A fourth parsing submodule, configured to parse out the initial decoding scheme of the current frame from the code stream, where the initial decoding scheme is the first decoding scheme or the second decoding scheme;
    第四确定子模块,用于若所述当前帧的初始解码方案与所述当前帧的前一帧的初始解码方案相同,则确定所述当前帧的解码方案为所述当前帧的初始解码方案;A fourth determining submodule, configured to determine that the decoding scheme of the current frame is the initial decoding scheme of the current frame if the initial decoding scheme of the current frame is the same as the initial decoding scheme of the previous frame of the current frame ;
    第五确定子模块,用于若所述当前帧的初始解码方案为所述第一解码方案且所述当前帧的前一帧的初始解码方案为所述第二解码方案,或所述当前帧的初始解码方案为所述第二解码方案且所述当前帧的前一帧的初始解码方案为所述第一解码方案,则确定所述当前帧的解码方案为所述第三解码方案。The fifth determining submodule is used to determine if the initial decoding scheme of the current frame is the first decoding scheme and the initial decoding scheme of the previous frame of the current frame is the second decoding scheme, or the current frame If the initial decoding scheme of the current frame is the second decoding scheme and the initial decoding scheme of the previous frame of the current frame is the first decoding scheme, then it is determined that the decoding scheme of the current frame is the third decoding scheme.
  51. 一种编码端设备,其特征在于,所述编码端设备包括存储器和处理器;An encoding end device, characterized in that the encoding end device includes a memory and a processor;
    所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的所述计算机程序,以实现权利要求1-14任一项所述的编码方法。The memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so as to realize the coding method according to any one of claims 1-14.
  52. 一种解码端设备,其特征在于,所述解码端设备包括存储器和处理器;A decoding end device, characterized in that the decoding end device includes a memory and a processor;
    所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的所述计算机程序,以实现权利要求15-25任一项所述的解码方法。The memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so as to realize the decoding method according to any one of claims 15-25.
  53. 一种计算机可读存储介质,其特征在于,所述存储介质内存储有指令,当所述指令在所述计算机上运行时,使得所述计算机执行权利要求1-25任一项所述的方法的步骤。A computer-readable storage medium, characterized in that instructions are stored in the storage medium, and when the instructions are run on the computer, the computer is made to execute the method according to any one of claims 1-25 A step of.
  54. 一种计算机程序产品,其特征在于,所述计算机程序产品包含指令,所述指令被处理器执行时实现如权利要求1-25中任一项所述的方法。A computer program product, characterized in that the computer program product includes instructions, and when the instructions are executed by a processor, the method according to any one of claims 1-25 is implemented.
PCT/CN2022/120495 2021-09-29 2022-09-22 Encoding and decoding method and apparatus, and device, storage medium and computer program product WO2023051368A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111155384.0 2021-09-29
CN202111155384.0A CN115881140A (en) 2021-09-29 2021-09-29 Encoding and decoding method, device, equipment, storage medium and computer program product

Publications (1)

Publication Number Publication Date
WO2023051368A1 true WO2023051368A1 (en) 2023-04-06

Family

ID=85756476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120495 WO2023051368A1 (en) 2021-09-29 2022-09-22 Encoding and decoding method and apparatus, and device, storage medium and computer program product

Country Status (3)

Country Link
CN (1) CN115881140A (en)
TW (1) TW202333139A (en)
WO (1) WO2023051368A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173675A1 (en) * 2003-03-11 2006-08-03 Juha Ojanpera Switching between coding schemes
WO2011034376A2 (en) * 2009-09-17 2011-03-24 Lg Electronics Inc. A method and an apparatus for processing an audio signal
CN102341851A (en) * 2009-03-06 2012-02-01 株式会社Ntt都科摩 Sound signal coding method, sound signal decoding method, coding device, decoding device, sound signal processing system, sound signal coding program, and sound signal decoding program
US20120320978A1 (en) * 2003-05-12 2012-12-20 Google Inc. Coder optimization using independent bitstream partitions and mixed mode entropy coding
US20150098572A1 (en) * 2012-05-14 2015-04-09 Thomson Licensing Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
US20170164131A1 (en) * 2014-07-02 2017-06-08 Dolby International Ab Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
US20170365264A1 (en) * 2015-03-09 2017-12-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
CN109215668A (en) * 2017-06-30 2019-01-15 华为技术有限公司 A kind of coding method of interchannel phase differences parameter and device
CN112074902A (en) * 2018-02-01 2020-12-11 弗劳恩霍夫应用研究促进协会 Audio scene encoder, audio scene decoder, and related methods using hybrid encoder/decoder spatial analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173675A1 (en) * 2003-03-11 2006-08-03 Juha Ojanpera Switching between coding schemes
US20120320978A1 (en) * 2003-05-12 2012-12-20 Google Inc. Coder optimization using independent bitstream partitions and mixed mode entropy coding
CN102341851A (en) * 2009-03-06 2012-02-01 株式会社Ntt都科摩 Sound signal coding method, sound signal decoding method, coding device, decoding device, sound signal processing system, sound signal coding program, and sound signal decoding program
WO2011034376A2 (en) * 2009-09-17 2011-03-24 Lg Electronics Inc. A method and an apparatus for processing an audio signal
US20150098572A1 (en) * 2012-05-14 2015-04-09 Thomson Licensing Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
US20170164131A1 (en) * 2014-07-02 2017-06-08 Dolby International Ab Method and apparatus for decoding a compressed hoa representation, and method and apparatus for encoding a compressed hoa representation
US20170365264A1 (en) * 2015-03-09 2017-12-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
CN109215668A (en) * 2017-06-30 2019-01-15 华为技术有限公司 A kind of coding method of interchannel phase differences parameter and device
CN112074902A (en) * 2018-02-01 2020-12-11 弗劳恩霍夫应用研究促进协会 Audio scene encoder, audio scene decoder, and related methods using hybrid encoder/decoder spatial analysis

Also Published As

Publication number Publication date
CN115881140A (en) 2023-03-31
TW202333139A (en) 2023-08-16

Similar Documents

Publication Publication Date Title
US20150280676A1 (en) Metadata for ducking control
US20140226842A1 (en) Spatial audio processing apparatus
JPWO2008016097A1 (en) Stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof
CN107277691B (en) Multi-channel audio playing method and system based on cloud and audio gateway device
US20230137053A1 (en) Audio Coding Method and Apparatus
US20230179941A1 (en) Audio Signal Rendering Method and Apparatus
US20210264926A1 (en) Inter-channel phase difference parameter encoding method and apparatus
EP2610867B1 (en) Audio reproducing device and audio reproducing method
WO2021213128A1 (en) Audio signal encoding method and apparatus
GB2592896A (en) Spatial audio parameter encoding and associated decoding
US20230298600A1 (en) Audio encoding and decoding method and apparatus
US20230145725A1 (en) Multi-channel audio signal encoding and decoding method and apparatus
US20230105508A1 (en) Audio Coding Method and Apparatus
WO2023051368A1 (en) Encoding and decoding method and apparatus, and device, storage medium and computer program product
WO2023051367A1 (en) Decoding method and apparatus, and device, storage medium and computer program product
AU2021388397A1 (en) Audio encoding/decoding method and device
WO2023051370A1 (en) Encoding and decoding methods and apparatus, device, storage medium, and computer program
WO2022242534A1 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium and computer program
WO2022258036A1 (en) Encoding method and apparatus, decoding method and apparatus, and device, storage medium and computer program
US20230197087A1 (en) Spatial audio parameter encoding and associated decoding
JP2024518846A (en) Method and apparatus for encoding three-dimensional audio signals, and encoder
WO2021255327A1 (en) Managing network jitter for multiple audio streams
WO2023023504A1 (en) Wireless surround sound system with common bitstream
Rose et al. Digital infotainment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874755

Country of ref document: EP

Kind code of ref document: A1