CN116798438A - Encoding and decoding method, encoding and decoding equipment and terminal equipment for multichannel signals - Google Patents

Encoding and decoding method, encoding and decoding equipment and terminal equipment for multichannel signals Download PDF

Info

Publication number
CN116798438A
CN116798438A CN202210699863.7A CN202210699863A CN116798438A CN 116798438 A CN116798438 A CN 116798438A CN 202210699863 A CN202210699863 A CN 202210699863A CN 116798438 A CN116798438 A CN 116798438A
Authority
CN
China
Prior art keywords
channel
mute
signal
flag
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210699863.7A
Other languages
Chinese (zh)
Inventor
王智
王喆
李海婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2023/073845 priority Critical patent/WO2023173941A1/en
Priority to TW112108251A priority patent/TW202403728A/en
Publication of CN116798438A publication Critical patent/CN116798438A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the application discloses a coding method, a coding and decoding device and a terminal device of a multi-channel signal, wherein the coding and decoding method of the multi-channel signal comprises the following steps: acquiring mute flag information of a multi-channel signal, the mute flag information including: a mute enable flag, and/or a mute flag; carrying out multichannel coding processing on the multichannel signals to obtain transmission channel signals of all transmission channels; generating a code stream according to the transmission channel signals of the transmission channels and the mute flag information, wherein the code stream comprises: and the mute label information and the multi-channel coding result of the transmission channel signal. In the embodiment of the application, the transmission channel signals of all the transmission channels are encoded according to the mute label information to generate the code stream, and the mute condition of the multi-channel signals is considered, so that the encoding efficiency and the encoding bit resource utilization rate are improved.

Description

Encoding and decoding method, encoding and decoding equipment and terminal equipment for multichannel signals
The present application claims priority from China patent office, application No. 202210254868.9, china patent application entitled "encoding and decoding method and terminal device for multichannel Signal and network device", filed on 14 months 2022, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of audio encoding and decoding, and in particular, to a method and apparatus for encoding and decoding a multichannel signal, and a terminal apparatus.
Background
Compression of audio data is an indispensable link in media applications such as media communication and media broadcasting. Compression of audio data may be achieved by multi-channel coding, which may be coding a sound bed signal having a plurality of channels, or encoding a plurality of object audio signals. The multi-channel coding may also be coding a mixed signal containing both the bed signal and the object audio signal.
The bed signal, the object signal, or the mixed signal including the bed signal and the object audio signal may be input as a multi-channel signal into the audio channel, but the characteristics of the multi-channel signal may not be exactly the same, and the characteristics of the multi-channel signal are also continuously changed.
Currently, for the multi-channel signal, a fixed coding scheme is adopted for processing, for example, a unified bit allocation scheme is adopted for processing, and the multi-channel signal is quantized and coded according to the bit allocation result. The unified bit allocation scheme has the advantages of simplicity and easiness in operation, but has the problems of low coding efficiency and waste of coded bit resources.
Disclosure of Invention
The embodiment of the application provides a coding and decoding method, coding and decoding equipment and terminal equipment for multichannel signals, which are used for improving coding efficiency and coding bit resource utilization rate.
In order to solve the technical problems, the embodiment of the application provides the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for encoding a multi-channel signal, including:
obtaining mute flag information of a multi-channel signal to obtain mute flag information, the mute flag information comprising: a mute enable flag, and/or a mute flag;
carrying out multichannel coding processing on the multichannel signals to obtain transmission channel signals of all transmission channels;
generating a code stream according to the transmission channel signals of the transmission channels and the mute flag information, wherein the code stream comprises: and the mute label information and the multi-channel quantization coding result of the transmission channel signals of the transmission channels.
In the above-described aspect, mute flag information of a multi-channel signal includes: a mute enable flag, and/or a mute flag; carrying out multichannel coding processing on the multichannel signals to obtain transmission channel signals of all transmission channels; generating a code stream according to the transmission channel signals of the transmission channels and the mute flag information, wherein the code stream comprises: and the mute label information and the multi-channel quantization coding result of the transmission channel signals of the transmission channels. In the embodiment of the application, the transmission channel signals of all the transmission channels are encoded according to the mute label information to generate the code stream, and the mute condition of the multi-channel signals is considered, so that the encoding efficiency and the encoding bit resource utilization rate are improved.
In one possible implementation, the multi-channel signal includes: an acoustic bed signal, and/or a subject signal;
the mute flag information includes: the mute enable flag; the mute enable flag includes: a global mute enable flag, or a partial mute enable flag, wherein,
the global silence enable flag is a silence enable flag acting on the multi-channel signal; or alternatively, the process may be performed,
the partial mute enable flag is a mute enable flag acting on a partial channel of the multi-channel signal.
In one possible implementation, when the mute enable flag is the partial mute enable flag,
the partial mute enable flag is an object mute enable flag acting on the object signal, or the partial mute enable flag is an acoustic-bed mute enable flag acting on the acoustic-bed signal, or the partial mute enable flag is a mute enable flag acting on other channel signals of the multi-channel signal that do not contain LFE channel signals with non-low frequency effects, or the partial mute enable flag is a mute enable flag acting on channel signals of a participating group pair in the multi-channel signal.
In the above scheme, the global silence enable flag or the partial silence enable flag can perform silence indication on the voice bed signal and/or the object signal, so that subsequent encoding processing, such as bit allocation, can be performed based on the global silence enable flag or the partial silence enable flag, thereby improving encoding efficiency.
In one possible implementation, the multi-channel signal includes: an acoustic bed signal, and a subject signal;
the mute flag information includes: the mute enable flag; the mute enable flag includes: a sound bed silence enable flag, and an object silence enable flag,
the silence enabling flag occupies a first bit and a second bit, the first bit is used for bearing the value of the silence enabling flag of the sound bed, and the second bit is used for bearing the value of the silence enabling flag of the object.
In the above scheme, the silence enable flag may use different bits to indicate a specific implementation manner of the silence enable flag, for example, a first bit and a second bit are predefined, and the silence enable flag can be indicated as a voice bed silence enable flag and an object silence enable flag through the different bits.
In one possible implementation, the mute flag information includes: the mute enable flag;
the mute enable mark is used for indicating whether a mute mark detection function is started or not; or alternatively, the process may be performed,
the mute enable flag is used for indicating whether each channel of the multi-channel signal needs to be transmitted; or alternatively, the process may be performed,
the mute enable flag is used to indicate whether each channel of the multi-channel signal is an unmuted channel.
In the above scheme, the silence enable flag is used to indicate whether the silence detection function is on. For example, when the mute enable flag is a first value (e.g., 1), the mute detection function is turned on to further detect the mute flag of each channel of the multi-channel signal. When the silence enable flag is a second value (e.g., 0), it indicates that the silence detection function is turned off.
In the above scheme, the mute enable flag may also be used to indicate whether each channel of the multi-channel signal is an unmuted channel. For example, when the mute enable flag is a first value (e.g., 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is a second value (e.g., 0), each channel of the multi-channel signal is represented as an unmuted channel.
In one possible implementation manner, the acquiring mute flag information of the multi-channel signal includes:
acquiring the mute label information according to a control signaling input into coding equipment; or alternatively, the process may be performed,
acquiring the mute label information according to the coding parameters of the coding equipment; or alternatively, the process may be performed,
and performing mute mark detection on each channel of the multi-channel signal to obtain the mute mark information.
In the above scheme, control signaling may be input into the encoding device, and mute flag information may be determined according to the control signaling, where the mute flag information may be controlled by external input, or the encoding device may include encoding parameters (also referred to as encoder parameters), where the encoding parameters may be used to determine the mute flag information, and may be preset according to encoder parameters such as an encoding rate, an encoding bandwidth, and the like. Alternatively, the mute flag information may be determined based on the mute detection result of each channel. The implementation manner of the mute flag information in the embodiment of the present application is not limited.
In one possible implementation, the mute flag information includes: the mute enable flag and the mute flag;
the mute label detection is performed on each channel of the multi-channel signal to obtain mute label information, including:
Performing mute mark detection on each channel of the multi-channel signal to obtain mute marks of each channel;
and determining the mute enabling mark according to the mute mark of each channel.
In the above scheme, the encoding end may detect the mute flag of each channel, where the mute flag of each channel is used to indicate whether each channel is a mute frame. After determining the mute flag for each channel, a mute enable flag is determined from the mute flag for each channel, and the mute enable flag can be generated based on the above manner, whereby mute flag information can be generated.
In one possible implementation, the mute flag information includes: the mute flag; alternatively, the mute flag information includes: the mute enable flag and the mute flag;
and the mute mark is used for indicating whether each channel acted by the mute enabling mark is a mute channel, and the mute channel is a channel which does not need to be encoded or a channel which needs to be encoded according to low bits.
In the above scheme, when the value of the mute flag is a first value (for example, 1), the channel indicating that the mute enable flag is active is a mute channel; when the value of the mute flag is a second value (e.g., 0), the channel indicating that the mute enable flag is active is an unmuted channel. When the value of the mute flag is a first value (e.g., 1), the channel is not encoded or encoded in the lower bits.
In a possible implementation manner, before the acquiring mute flag information of the multi-channel signal, the method further includes:
preprocessing the multichannel signal to obtain a preprocessed multichannel signal, wherein the preprocessing comprises at least one of the following steps: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping and frequency band extension coding;
the acquiring mute flag information of the multi-channel signal includes:
and performing mute mark detection on the preprocessed multi-channel signal to obtain the mute mark information.
In the above scheme, the encoding efficiency of the multi-channel signal can be improved through the above preprocessing procedure.
In one possible implementation, the method further includes:
preprocessing the multichannel signal to obtain a preprocessed multichannel signal, wherein the preprocessing comprises at least one of the following steps: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping and frequency band extension coding;
and correcting the mute label information according to the preprocessed multichannel signals.
In the above scheme, after preprocessing, the mute flag information may be further corrected according to the result of preprocessing, for example, after shaping the frequency domain noise, the energy of a certain channel of the multi-channel signal is changed, and the mute flag detection result of the channel may be adjusted, so as to correct the mute flag information.
In one possible implementation manner, the generating a code stream according to the transmission channel signal of each transmission channel and the mute flag information includes:
adjusting an initial multi-channel processing mode according to the mute label information to obtain an adjusted multi-channel processing mode;
and encoding the multichannel signal according to the adjusted multichannel processing mode to obtain the code stream.
In the above scheme, the encoding end can adjust the initial multi-channel processing mode according to the mute flag information, and then encode the multi-channel signal according to the adjusted multi-channel processing mode, thereby improving the encoding efficiency. For example, during the screening of the multi-channel signal, the channel with mute flag 1 does not participate in the group pair screening.
In one possible implementation manner, the generating a code stream according to the transmission channel signal of each transmission channel and the mute flag information includes:
bit allocation is carried out for each transmission channel according to the mute label information, the available bit number and the multi-channel side information, and a bit allocation result of each transmission channel is obtained;
and encoding the transmission channel signals of the transmission channels according to the bit allocation result of the channels to obtain the code stream.
In the scheme, the coding end performs bit allocation according to the mute label information, the available bit number and the multi-channel side information; and coding according to the bit allocation result of each transmission channel to obtain a coded code stream. The specific content of the bit allocation policy is not limited. For example, the encoding of the transmission channel signal may be multi-channel quantization encoding, and the specific implementation of the multi-channel quantization encoding in the embodiment of the present application may be that the signals after the group-pair downmixing are changed by a neural network, so as to obtain potential characteristics; and quantizing the potential features and performing interval coding. A specific implementation of the multi-channel quantization coding may be to quantize the group-wise down-mixed signal based on vector quantization.
In one possible implementation manner, the allocating bits for each transmission channel according to the silence flag information, the available bit number and the multi-channel side information includes:
and according to the available bit number and the multi-channel side information, carrying out bit allocation for each transmission channel according to the bit allocation strategy corresponding to the mute mark information.
In the above scheme, bit allocation is performed according to the silence flag information, and may be first performed according to the total available bits and signal characteristics of each transmission channel, in combination with a bit allocation policy. And then adjusting the bit allocation result according to the mute label information, and improving the transmission efficiency of the multi-channel signal through adjusting the bit allocation.
In one possible implementation, the multi-channel side information includes: the channel bit allocation proportion field,
wherein the channel bit allocation proportion field is used for indicating the bit allocation proportion between non-low frequency effect LFE channels in the multi-channel signal.
In the above scheme, the bit allocation proportion field indicates the bit allocation proportion of all channels except the LFE channel in the multi-channel signal, so as to determine the bit number of each non-LFE channel.
In one possible implementation manner, the mute flag detection for each channel of the multi-channel signal includes:
determining signal energy of each channel of the current frame according to the input signals of each channel of the current frame of the multi-channel signal;
determining silence detection parameters of each channel of the current frame according to the signal energy of each channel of the current frame;
and determining a mute mark of each channel of the current frame according to the mute detection parameters of each channel of the current frame and a preset mute detection threshold.
In the above scheme, the silence detection parameters of each channel of the current frame are compared with the silence detection threshold, and taking silence flag detection of the first channel of the current frame as an example, if the silence detection parameters of the first channel of the current frame are smaller than the silence detection threshold, the first channel of the current frame is a silence frame, that is, the first channel of the current time is a silence channel, and the silence flag muteFlag [1] of the first channel of the current frame is a first value (for example, 1). If the silence detection parameter of the first channel of the current frame is greater than or equal to the silence detection threshold, the first channel of the current frame is a non-silence frame, i.e. the first channel of the current time is a non-silence channel, and the silence flag muteFlag [1] of the first channel of the current frame is a second value (e.g. 0).
In one possible implementation manner, the performing multi-channel encoding processing on the multi-channel signal to obtain a transmission channel signal of each transmission channel includes:
carrying out multichannel signal screening on the multichannel signals to obtain screened multichannel signals;
performing group pair processing on the screened multi-channel signals to obtain multi-channel group pair signals and multi-channel side information;
and carrying out down-mixing processing on the multi-channel group pair signals according to the multi-channel side information so as to obtain transmission channel signals of all the transmission channels.
In the above scheme, the encoding device screens the multi-channel signals, for example, screens out the multi-channel signals that do not participate in the multi-channel group pair, and obtains the screened multi-channel signals. The filtered multi-channel signal may be a multi-channel signal of a participating group pair, e.g. the filtered channels do not comprise LFE channels. After the multi-channel signals are screened, the multi-channel signals can be further grouped, for example, ch1 and ch2 form a channel group pair, so as to obtain multi-channel group pair signals. After generating the multi-channel group pair signal, the down-mixing process is performed, and a specific down-mixing process is not described in detail, so that a transmission channel signal of each transmission channel can be obtained.
In one possible implementation, the multi-channel side information includes at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index;
wherein the inter-channel level difference parameter quantization codebook index is a codebook index indicating inter-channel level difference ILD parameter quantization of each channel of the multi-channel signal,
the number of channel group pairs representing a number of channel group pairs of a current frame of the multi-channel signal,
the channel pair index is used for representing the index of the channel pair.
In the above scheme, the number of bits occupied by the inter-channel amplitude difference parameter quantization codebook index is not limited in the embodiment of the present application. For example, the inter-channel amplitude difference parameter quantization codebook index occupies 5 bits. The inter-channel magnitude difference parameter quantization codebook index may be represented as mcIld [ ch1], mcIld [ ch2], occupying 5 bits, and the inter-channel magnitude difference ILD parameter quantized codebook index of each channel in the current channel pair is used to restore the magnitude of the decoded spectrum. The embodiment of the application does not limit the number of bits occupied by the number of the sound channel groups. For example, the channel group pair number occupies 4 bits, and the channel group pair number is expressed as parircnt, occupies 4 bits, and is used to express the channel group pair number of the current frame. The embodiment of the application does not limit the bit number occupied by the channel pair index. For example, the channel pair index is represented as channelPairIndex, channelPairIndex bits, which is related to the total number of channels, and is used to represent the index of the channel pair, and index values of two channels in the current channel pair, that is, ch1 and ch2, can be obtained by parsing.
In a second aspect, an embodiment of the present application provides a decoding method of a multi-channel signal, including:
analyzing mute label information from a code stream of coding equipment, and determining coding information of each transmission channel according to the mute label information, wherein the mute label information comprises: a mute enable flag, and/or a mute flag;
decoding the encoded information of each transmission channel to obtain a decoded signal of each transmission channel;
and carrying out multi-channel decoding processing on the decoding signals of the transmission channels to obtain multi-channel decoding output signals.
In the above scheme, in the embodiment of the application, the decoding end can obtain the mute label information from the code stream of the encoding end, so that the decoding end can conveniently perform decoding processing in a mode consistent with the encoding end.
In one possible implementation manner, the parsing the silence flag information from the code stream of the encoding device includes:
analyzing mute marks of all channels from the code stream; or alternatively, the process may be performed,
analyzing the mute enabling mark from the code stream, and analyzing the mute mark from the code stream if the mute enabling mark is a first value; or alternatively, the process may be performed,
analyzing a sound bed mute enabling mark and/or an object mute enabling mark and a mute mark of each sound channel from the code stream; or alternatively, the process may be performed,
Analyzing a sound bed silence enabling mark and/or an object silence enabling mark from the code stream; and according to the sound bed mute enabling mark and/or the object mute enabling mark, analyzing mute marks of partial sound channels of each sound channel from the code stream.
In the above scheme, the code end parses the mute flag information from the code stream of the encoding device, and the mute flag information obtained by the decoding end corresponds to the encoding side according to the specific content of the mute flag information generated by the encoding device. Specifically, in one mode, the mute flag is used to indicate whether each channel is a mute channel, where the mute channel is a channel that does not need to be encoded or a channel that needs to be encoded according to a low bit, and the decoding end can parse the mute flag of each channel from the code stream. In one approach, a mute-enable flag may also be used to indicate whether each channel is an unmuted channel. For example, when the mute enable flag is a first value (e.g., 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is a second value (e.g., 0), it indicates that each channel is an unmuted channel, the decoding end parses the mute enable flag from the code stream, and if the mute enable flag is a first value, parses the mute flag from the code stream. In one form, the mute-enable flag includes: the decoding end analyzes the sound bed mute enable mark and/or the object mute enable mark and the mute mark of each sound channel from the code stream. In one mode, a decoding end analyzes an audio bed silence enabling mark and/or an object silence enabling mark from a code stream; and according to the sound bed mute enable mark and/or the object mute enable mark, analyzing the mute mark of part of the sound channels from the code stream.
In one possible implementation manner, the decoding the encoded information of each transmission channel includes:
analyzing multi-channel side information from the code stream;
bit allocation is carried out for each transmission channel according to the multi-channel side information and the mute flag information so as to obtain the coding bit number of each channel;
and decoding the coding information of each transmission channel according to the coding bit number of each channel.
In the above scheme, the code stream may further include multi-channel side information, the decoding end may perform bit allocation for each transmission channel according to the multi-channel side information and the mute flag information, so as to obtain the number of coding bits of each transmission channel, the number of coding bits obtained by the decoding end is the same as the number of coding bits preset by the encoding end, and then decode the coding information of each transmission channel according to the number of coding bits of each transmission channel, thereby implementing decoding of the transmission channel signals of each transmission channel.
In one possible implementation manner, after the decoding signal of each transmission channel is subjected to multi-channel decoding processing to obtain a multi-channel decoded output signal, the method further includes:
-post-processing the multi-channel decoded output signal, the post-processing comprising at least one of: band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time frequency transformation.
In the above scheme, the process of post-processing the multichannel decoded output signal is opposite to the process of preprocessing at the encoding end, and the specific processing mode is not limited.
In one possible implementation, the multi-channel side information includes at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index;
wherein the inter-channel level difference parameter quantization codebook index is a codebook index for indicating inter-channel level difference ILD parameter quantization of each of the channels,
the number of channel group pairs representing a number of channel group pairs of a current frame of the multi-channel signal,
the channel pair index is used for representing the index of the channel pair.
In a third aspect, an embodiment of the present application provides an encoding apparatus, including:
the mute mark detection module is used for acquiring mute mark information of the multi-channel signal, and the mute mark information comprises: a mute enable flag, and/or a mute flag;
the multichannel coding module is used for carrying out multichannel coding processing on the multichannel signals so as to obtain transmission channel signals of all transmission channels;
A code stream generating module, configured to generate a code stream according to the transmission channel signals of the transmission channels and the silence flag information, where the code stream includes: and the mute label information and the multichannel quantized coding result of the transmission channel signal.
In a fourth aspect, an embodiment of the present application provides a decoding apparatus, including:
the analysis module is used for analyzing the mute label information from the code stream of the encoding equipment and determining the encoding information of each transmission channel according to the mute label information, wherein the mute label information comprises: a mute enable flag, and/or a mute flag;
the inverse quantization module is used for decoding the coding information of each transmission channel to obtain decoding signals of each transmission channel;
and the multichannel decoding module is used for carrying out multichannel decoding processing on the decoding signals of the transmission channels so as to obtain multichannel decoding output signals.
In a fifth aspect, embodiments of the present application provide a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of the first or second aspects described above.
In a sixth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first or second aspect described above.
In a seventh aspect, an embodiment of the present application provides a communication apparatus, where the communication apparatus may include an entity such as a terminal device or a chip, and the communication apparatus includes: a processor, a memory; the memory is used for storing instructions; the processor is configured to execute the instructions in the memory to cause the communication device to perform the method of any one of the preceding first or second aspects.
In an eighth aspect, embodiments of the present application provide a computer readable storage medium having stored therein a code stream generated by the method of the first aspect.
In a ninth aspect, the present application provides a chip system comprising a processor for supporting a codec device to perform the functions involved in the above aspects, e.g. to transmit or process data and/or information involved in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the codec device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
Drawings
Fig. 1 is a schematic diagram of a composition structure of a multi-channel signal processing system according to an embodiment of the present application;
fig. 2a is a schematic diagram of an audio encoder and an audio decoder applied to a terminal device according to an embodiment of the present application;
fig. 2b is a schematic diagram of an audio encoder according to an embodiment of the present application applied to a wireless device or a core network device;
fig. 2c is a schematic diagram of an audio decoder applied to a wireless device or a core network device according to an embodiment of the present application;
fig. 3a is a schematic diagram of a multi-channel encoder and a multi-channel decoder according to an embodiment of the present application applied to a terminal device;
fig. 3b is a schematic diagram of a multi-channel encoder according to an embodiment of the present application applied to a wireless device or a core network device;
fig. 3c is a schematic diagram of a multi-channel decoder according to an embodiment of the present application applied to a wireless device or a core network device;
fig. 4 is a schematic diagram of a method for encoding a multi-channel signal according to an embodiment of the present application;
fig. 5 is a schematic diagram of a decoding method of a multi-channel signal according to an embodiment of the present application;
fig. 6 is a schematic diagram of a coding flow of a multi-channel signal according to an embodiment of the present application;
fig. 7 is a schematic diagram of a coding flow of a multi-channel signal according to an embodiment of the present application;
Fig. 8 is a schematic diagram of a decoding process of a multi-channel signal according to an embodiment of the present application;
fig. 9 is a schematic diagram of a decoding process of a multi-channel signal according to an embodiment of the present application;
fig. 10 is a schematic diagram of a composition structure of an encoding apparatus according to an embodiment of the present application;
fig. 11 is a schematic diagram of a composition structure of a decoding device according to an embodiment of the present application;
fig. 12 is a schematic diagram of a composition structure of another encoding apparatus according to an embodiment of the present application;
fig. 13 is a schematic diagram of a composition structure of another decoding apparatus according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a multi-channel signal encoding and decoding method, terminal equipment and network equipment, which are used for improving the encoding efficiency and the encoding bit resource utilization rate.
Embodiments of the present application are described below with reference to the accompanying drawings.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Sound (sound) is a continuous wave generated by the vibration of an object. An object that produces vibrations and emits sound waves is called a sound source. During the propagation of sound waves through a medium (e.g., air, solid, or liquid), the auditory function of a human or animal senses sound.
The characteristics of sound waves include pitch, intensity and timbre. The pitch represents the level of sound. The sound intensity indicates the size of the sound. The intensity of sound may also be referred to as loudness or volume. The units of sound intensity are decibels (dB). The tone color is also called a sound product.
The frequency of the sound wave determines the pitch. The higher the frequency the higher the tone. The number of times an object vibrates within one second is called the frequency, which is in hertz (Hz). The frequency of the sound recognizable by the human ear is between 20Hz and 20000 Hz.
The amplitude of the sound wave determines the intensity of the sound intensity. The larger the amplitude the greater the intensity. The closer to the sound source, the greater the intensity.
The waveform of the sound wave determines the tone. The waveform of the acoustic wave includes square wave, sawtooth wave, sine wave, pulse wave and the like.
Sounds can be classified into regular sounds and irregular sounds according to characteristics of sound waves. The irregular sound refers to a sound emitted by an irregularly vibrating sound source. The random sound is, for example, noise affecting people's work, learning, rest, etc. The regular sound refers to a sound emitted by the sound source vibrating regularly. Regular sounds include voices and musical tones. When the sound is electrically represented, the regular sound is an analog signal that continuously varies in the time-frequency domain. The analog signal may be referred to as an audio signal (audio signal). An audio signal is an information carrier carrying speech, music and sound effects.
Since human hearing has the ability to discern the position distribution of sound sources in space, the listener can perceive the azimuth of the sound in addition to the pitch, intensity and timbre of the sound when hearing the sound in space.
Sound can also be classified into mono and stereo according to the division. The mono channel has a sound channel, and sound is picked up by a microphone and played by a speaker. Stereo sound has a plurality of sound channels, and different sound channels transmit different sound waveforms. The sound channels may also be referred to as channels or channels, for example, the multi-channel signal may include signals of channels, which may also be referred to as channels, and the meaning of the channels is the same in the following embodiments of the present application. When the multi-channel signal is multi-channel coded, a transmission channel signal of each transmission channel is obtained, and the transmission channel refers to a channel after multi-channel coding, further, the multi-channel coding may include channel group pairs and downmix processing, so that the transmission channel may also be referred to as channel group pairs and downmix channels. See for details the description of the multi-channel encoding process in the subsequent embodiments.
The embodiment of the application is applied to the field of audio encoding and decoding, in particular to multi-channel encoding. The multi-channel coding may be to encode a sound bed signal having a plurality of channels, such as 5.1 channels, 5.1.4 channels, 7.1 channels, 7.1.4 channels, 22.2 channels, etc. Multi-channel coding may also be coding of multiple object audio signals. The multi-channel coding may also be coding of a mixed signal containing both the sound bed signal and/or the object audio signal.
Wherein, 5.1 channels: including a center channel (C), a front left channel (L), a front right channel (R), a rear left surround channel (LS), a rear right surround channel (RS), and a 0.1 (LFE) channel.
The 5.1.4 channels are added with the following channels on the basis of the 5.1 channels: a left high channel, a right Gao Shengdao, a left high surround channel, and a right high surround channel.
The 7.1 channels include a center channel (C), a front left channel (L), a front right channel (R), a rear left surround channel (LS), a rear right surround channel (RS), a left rear channel (LB), a right rear channel (RB), and a 0.1 channel LFE channel.
The 7.1.4 channels are added with 4 height channels on the basis of the 7.1 channels.
The 22.2 channel is a multi-channel format comprising three layers of 22 channels and 2 LFE channels.
The mixed signal of the sound bed signal and the object signal is a signal combination in three-dimensional sound, and the requirements of audio recording, transmission and playback of complex scenes such as film making, sports games, concerts and the like are jointly completed. For example, the sound content of a field in a sports game retransmission is typically represented by a sound bed signal, and comments of different commentators are typically represented by multiple object audio. Whether the bed signal, the object signal, or the mixed signal containing the bed signal and the object audio signal, the characteristics of the input signals between different channels are not exactly the same at the same time, and the characteristics of the input signals of the same channel are also constantly changing at different times.
The current multi-channel signal adopts a fixed coding scheme, and does not consider the difference of input signal characteristics between different time points and/or different channels, for example, a unified bit allocation scheme is adopted for processing, and the multi-channel signal is quantized and coded according to the bit allocation result.
The same bit allocation scheme cannot adapt to the changes of the characteristics of the input signals between different channels at different moments, and the coding efficiency is low. For example, the multi-channel audio signal to be encoded contains a 5.1.4 channel bed signal and 4 object signals. Among the 14 channels to be encoded, channels 0-9 belong to the bed signal and channels 10-13 belong to the object signal. At some point, channels 6-9 and channels 11, 12, 13 are silent channels (less information is perceived audibly) and the other channels contain primary audio information, i.e., non-silent channels. At another moment the mute channels become channels 10, 12, 13, the other channels containing the main audio information.
If the same bit allocation scheme is adopted at different time, some channels containing main audio information may not have enough bit numbers to encode, and some mute channels are allocated with excessive encoding bit numbers, resulting in waste of encoding bit resources.
Embodiments of the present application provide an audio processing technology, and in particular, an audio encoding technology for a multi-channel signal, which is an audio signal including a plurality of channels, for example, the multi-channel signal may be a stereo signal, so as to improve a conventional audio encoding system. The audio processing includes both audio encoding and audio decoding. Audio encoding is performed on the source side, including encoding (e.g., compressing) the original audio to reduce the amount of data needed to represent the audio, thereby more efficiently storing and/or transmitting. Audio decoding is performed on the destination side, including inverse processing with respect to the encoder to reconstruct the original audio. The encoding portion and decoding portion are also collectively referred to as encoding. The following describes in detail the implementation of the embodiment of the present application with reference to the drawings.
The technical scheme of the embodiment of the application can be applied to various audio processing systems, and as shown in fig. 1, the technical scheme of the embodiment of the application is a schematic diagram of the composition structure of the audio processing system. The audio processing system 100 may include: an encoding device 101 of a multi-channel signal and a decoding device 102 of the multi-channel signal. The encoding device 101 of the multi-channel signal may also be referred to as an audio encoding device, and may be configured to generate a code stream, where the audio encoded code stream may be transmitted to the decoding device 102 of the multi-channel signal through an audio transmission channel, the decoding device 102 of the multi-channel signal may also be referred to as a multi-audio decoding device, may receive the code stream, then perform an audio decoding function of the decoding device 102 of the multi-channel signal, and finally obtain a reconstructed signal.
In the embodiment of the application, the encoding device of the multi-channel signal can be applied to various terminal devices with audio communication requirements, wireless devices with transcoding requirements and core network devices, for example, the encoding device of the multi-channel signal can be an audio encoder of the terminal device or the wireless device or the core network device. Also, the decoding apparatus of the multi-channel signal may be applied to various terminal devices having audio communication requirements, wireless devices having transcoding requirements, and core network devices, and for example, the decoding apparatus of the multi-channel signal may be an audio decoder of the above terminal device or the wireless device or the core network device. For example, the audio encoder may include a media gateway of a radio access network, a core network, a transcoding device, a media resource server, a mobile terminal, a fixed network terminal, etc., and may also be an audio encoder applied in a Virtual Reality (VR) streaming media (streaming) service.
In an embodiment of the present application, taking an audio encoding module (audio encoding and audio decoding) applicable to a virtual reality streaming media (VR streaming) service as an example, an end-to-end audio signal encoding and decoding process includes: the audio signal A is subjected to acquisition module (acquisition) and then subjected to preprocessing operation (audio processing), the preprocessing operation comprises filtering out a low-frequency part in the signal, wherein the low-frequency part can be 20Hz or 50Hz as a demarcation point, azimuth information in the signal is extracted, then the audio signal A is packaged (file/segment encapsulation) and then sent (decoded) to a decoding end, the decoding end firstly carries out unpacking (file/segment decapsulation), then carries out decoding (audio decoding), carries out binaural rendering (audio rendering) on the decoded signal, and the rendered signal is mapped to a listener earphone (headphone) and can be an independent earphone or an earphone on glasses equipment.
As shown in fig. 2a, a schematic diagram of an audio encoder and an audio decoder applied to a terminal device is provided in an embodiment of the present application. For each terminal device may include: audio encoder, channel encoder, audio decoder, channel decoder. Specifically, the channel encoder is used for channel encoding the audio signal, and the channel decoder is used for channel decoding the audio signal. For example, the first terminal device 20 may include: a first audio encoder 201, a first channel encoder 202, a first audio decoder 203, a first channel decoder 204. The second terminal device 21 may include: a second audio decoder 211, a second channel decoder 212, a second audio encoder 213, a second channel encoder 214. The first terminal device 20 is connected to a first network communication device 22, which is wireless or wired, and the first network communication device 22 is connected to a second network communication device 23, which is wireless or wired, through a digital channel, and the second terminal device 21 is connected to the second network communication device 23, which is wireless or wired. The above-mentioned wireless or wired network communication device may be referred to generally as a signal transmission device, such as a communication base station, a data exchange device, or the like.
In audio communication, a terminal device serving as a transmitting end firstly performs audio acquisition, performs audio encoding on an acquired audio signal, performs channel encoding, and then performs transmission in a digital channel through a wireless network or a core network. And the terminal equipment serving as the receiving end performs channel decoding according to the received signal to obtain a code stream, then audio signals are recovered through audio decoding, and audio playback is performed by the terminal equipment of the receiving end.
As shown in fig. 2b, a schematic diagram of an audio encoder applied to a wireless device or a core network device according to an embodiment of the present application is shown. Wherein the wireless device or core network device 25 comprises: channel decoder 251, other audio decoder 252, audio encoder 253 provided by the embodiment of the present application, and channel encoder 254, wherein other audio decoder 252 refers to other audio decoders than the audio decoder. In the wireless device or the core network device 25, the signal entering the device is first channel decoded by the channel decoder 251, then audio decoded by the other audio decoder 252, then audio encoded by the audio encoder 253 provided by the embodiment of the present application, and finally channel encoded by the channel encoder 254, and then transmitted after completing the channel encoding. The other audio decoder 252 performs audio decoding on the code stream decoded by the channel decoder 251.
As shown in fig. 2c, an audio decoder according to an embodiment of the present application is applied to a wireless device or a core network device. Wherein the wireless device or core network device 25 comprises: channel decoder 251, audio decoder 255 provided by embodiments of the present application, other audio encoder 256, and channel encoder 254, wherein other audio encoder 256 refers to other audio encoders besides the audio encoder. In the wireless device or the core network device 25, the signal entering the device is first channel decoded by the channel decoder 251, then the received audio code stream is decoded by the audio decoder 255, then the audio code is performed by the other audio encoder 256, and finally the audio signal is channel encoded by the channel encoder 254, and then transmitted after the channel encoding is completed. In the wireless device or the core network device, if transcoding needs to be implemented, a corresponding audio encoding process needs to be performed. The wireless device refers to radio frequency related devices in communication, and the core network device refers to core network related devices in communication.
In some embodiments of the present application, the encoding apparatus of the multi-channel signal may be applied to various terminal devices with audio communication requirements, wireless devices with transcoding requirements, and core network devices, for example, the encoding apparatus of the multi-channel signal may be a multi-channel encoder of the above terminal device or the wireless device or the core network device. Also, the decoding apparatus of the multi-channel signal may be applied to various terminal devices having audio communication requirements, wireless devices having transcoding requirements, and core network devices, and for example, the decoding apparatus of the multi-channel signal may be a multi-channel decoder of the above terminal devices or wireless devices or core network devices.
As shown in fig. 3a, a schematic diagram of a multi-channel encoder and a multi-channel decoder according to an embodiment of the present application applied to terminal devices may include, for each terminal device: multi-channel encoder, multi-channel decoder, channel decoder. The multi-channel encoder may perform the audio encoding method provided by the embodiment of the present application, and the multi-channel decoder may perform the audio decoding method provided by the embodiment of the present application. Specifically, the channel encoder is configured to perform channel encoding on the multi-channel signal, and the channel decoder is configured to perform channel decoding on the multi-channel signal. For example, the first terminal device 30 may include: a first multi-channel encoder 301, a first channel encoder 302, a first multi-channel decoder 303, a first channel decoder 304. The second terminal device 31 may include: a second multi-channel decoder 311, a second channel decoder 312, a second multi-channel encoder 313, and a second channel encoder 314. The first terminal device 30 is connected to a first network communication device 32, which is wireless or wired, and the first network communication device 32 is connected to a second network communication device 33, which is wireless or wired, through a digital channel, and the second terminal device 31 is connected to the second network communication device 33, which is wireless or wired. The above-mentioned wireless or wired network communication device may be referred to generally as a signal transmission device, such as a communication base station, a data exchange device, or the like. The terminal equipment serving as a transmitting end in the audio communication carries out multichannel coding on the acquired multichannel signals, and then carries out channel coding and then carries out transmission in a digital channel through a wireless network or a core network. And the terminal equipment serving as the receiving end performs channel decoding according to the received signal to obtain a multichannel signal coding code stream, and then recovers the multichannel signal through multichannel decoding, and the multichannel signal is played back by the terminal equipment serving as the receiving end.
As shown in fig. 3b, a schematic diagram of a multi-channel encoder provided in an embodiment of the present application applied to a wireless device or a core network device, where the wireless device or the core network device 35 includes: the channel decoder 351, the other audio decoder 352, the multi-channel encoder 353, the channel encoder 354 are similar to the previous fig. 2b and are not described here again.
As shown in fig. 3c, a schematic diagram of a multi-channel decoder according to an embodiment of the present application applied to a wireless device or a core network device, where the wireless device or the core network device 35 includes: the channel decoder 351, the multi-channel decoder 355, the further audio encoder 356, the channel encoder 354 are similar to those described above with reference to fig. 2c, and will not be described again here.
The audio encoding process may be a part of a multi-channel encoder, the audio decoding process may be a part of a multi-channel decoder, for example, the multi-channel encoding may be performed on the acquired multi-channel signal, the acquired multi-channel signal is processed to obtain an audio signal, and the obtained audio signal is encoded according to the method provided by the embodiment of the present application; the decoding end decodes the multi-channel signal according to the multi-channel signal coding code stream to obtain an audio signal, and the multi-channel signal is recovered after upmixing processing. Therefore, the embodiment of the application can also be applied to a multi-channel encoder and a multi-channel decoder in terminal equipment, wireless equipment and core network equipment. In a wireless or core network device, if transcoding is to be implemented, a corresponding multi-channel coding process is required.
First, a method for encoding a multi-channel signal according to an embodiment of the present application is described, where the method may be performed by a terminal device, for example, the terminal device may be an encoding apparatus (hereinafter referred to as an encoding end or an encoder or an encoding device, for example, the encoding end may be an artificial intelligence (artificial intelligence, AI) encoder) of the multi-channel signal. The multi-channel signal in the embodiment of the present application may include a plurality of channels, for example, a first channel and a second channel, or the plurality of channels may include a first channel, a second channel, a third channel, and so on. As shown in fig. 4, an encoding process performed by the encoding device (or referred to as an encoding end) in the embodiment of the present application is described:
401. acquiring mute flag information of a multi-channel signal, the mute flag information including: a mute enable flag, and/or a mute flag.
After the encoding end inputs the multi-channel signal, the mute label information of the multi-channel signal can be obtained. The mute flag information may indicate a mute condition of channels in the multi-channel signal. For example, mute flag detection is performed on the multi-channel signal to detect whether the multi-channel signal supports mute flag, and the encoding end may generate mute flag information according to the multi-channel signal. The silence flag information may be used to guide subsequent encoding processes, such as bit allocation. The mute label information can also be written into the code stream by the encoding end and transmitted to the decoding end, thereby ensuring the consistency of encoding and decoding processing.
The mute flag information in the embodiment of the present application is used to indicate the mute flag of the multi-channel signal, and the mute flag information has various implementations, for example, the mute flag information may include a mute enable flag and/or a mute flag. The mute enable flag is used for indicating whether the mute detection is started or not, and the mute flag is used for indicating whether each channel of the multi-channel signal is a mute frame or not.
In some embodiments of the present application, the multi-channel signal comprises a sound bed signal and/or an object signal, and the current coding scheme does not consider the difference of input signal characteristics between different time instants and/or different channels, and adopts a unified coding scheme for processing, so that the coding efficiency is low. The mute enable mark provided by the embodiment of the application can carry out mute indication on the sound bed signal and/or the object signal. Specifically, the mute flag information includes: a mute enable flag; the mute enable flag includes: a global mute enable flag, or a partial mute enable flag, wherein,
the global mute enable flag is a mute enable flag acting on the multi-channel signal; or alternatively, the process may be performed,
the partial mute enable flag is a mute enable flag for a partial channel of the multi-channel signal.
Wherein, the mute enable flag is marked as HasSilFlag, and the mute enable flag can be a global mute enable flag or a partial mute enable flag. By means of the global silence enabling flag or the partial silence enabling flag, silence indication can be conducted on the voice bed signal and/or the object signal, and accordingly subsequent coding processing, such as bit allocation, can be conducted on the basis of the global silence enabling flag or the partial silence enabling flag, and coding efficiency can be improved.
In some specific implementations, when the mute enable flag is a partial mute enable flag,
the partial mute enable flag is an object mute enable flag acting on an object signal, or the partial mute enable flag is a sound bed mute enable flag acting on a sound bed signal, or the partial mute enable flag is a mute enable flag acting on other channels of the multi-channel signal not including non-low frequency effects (Low Frequency Effects, LFE) channels, or the partial mute enable flag is a mute enable flag acting on channel signals of a participating group pair in the multi-channel signal.
For example, the global silence enable flag acts on all channels and the partial silence enable flag acts on partial channels. For example, the object silence enable flag is applied to a channel corresponding to an object signal in the multi-channel signal, and the sound bed silence enable flag is applied to a channel corresponding to a sound bed signal in the multi-channel signal. For example, an object mute enable flag acting only on an object signal in a multichannel signal is denoted as obj muteena. As another example, the bed mute enable flag, which acts only on the bed signal in the multi-channel signal, is denoted as bedMuteEna.
For example, the global mute enable flag is a mute enable flag acting on the multi-channel signal: when the multi-channel signal only comprises the sound bed signal, the global mute enable mark is a mute enable mark acting on the sound bed signal; when the multichannel signal only comprises an object signal, the global mute enable mark is a mute enable mark acting on the object signal; when the multi-channel signal contains a sound bed signal and an object signal, the global silence enable flag is a silence enable flag acting on the sound bed signal and the object signal.
The partial mute enable flag is a mute enable flag acting on a partial channel in the multi-channel signal, and the partial channel is preset, for example, the partial mute enable flag is an object mute enable flag acting on the object signal, or the partial mute enable flag is a sound bed mute enable flag acting on the sound bed signal, or the partial mute enable flag is a mute enable flag acting on other channel signals in the multi-channel signal that do not include LFE channel signals. The partial mute enable flag is a mute enable flag for a channel signal acting on a participating group pair in a multi-channel signal. The specific manner of performing the group-pairing process on the multi-channel signal in the embodiment of the application is not limited.
In some embodiments of the present application, a multi-channel signal includes: an acoustic bed signal, and a subject signal;
the mute flag information includes: a mute enable flag; the mute enable flag includes: a sound bed silence enable flag, and an object silence enable flag,
the silence enable flag occupies a first bit for carrying the value of the bed silence enable flag and a second bit for carrying the value of the object silence enable flag.
The silence enable flag may use different bits to indicate a specific implementation manner of the silence enable flag, for example, a predefined first bit and a second bit, where the first bit is used to carry a value of a silence enable flag of an audio bed, and the second bit is used to carry a value of a silence enable flag of an object, and through the different bits, the silence enable flag can be indicated as the silence enable flag of the audio bed, and the silence enable flag of the object.
In some embodiments of the present application, step 401 acquires mute flag information of a multi-channel signal, including:
a1, acquiring the mute label information according to a control signaling input into coding equipment; or alternatively, the process may be performed,
a2, acquiring the mute label information according to the coding parameters of the coding equipment; or alternatively, the process may be performed,
A3, carrying out mute mark detection on each channel of the multi-channel signal to obtain the mute mark information.
Wherein, the coding device may input control signaling, and determine mute flag information according to the control signaling, and the mute flag information may be controlled by external input, or the coding device may include coding parameters (also referred to as encoder parameters), where the coding parameters may be used to determine the mute flag information, and may be preset according to encoder parameters such as a coding rate, a coding bandwidth, and the like. Alternatively, the mute flag information may be determined based on the mute detection result of each channel. The implementation manner of the mute flag information in the embodiment of the present application is not limited.
In some embodiments of the application, the silence flag information includes: a mute enable flag;
the mute enable flag is used for indicating whether a mute mark detection function is started;
the mute enable flag is used for indicating whether each channel of the multi-channel signal needs to be transmitted; or alternatively, the process may be performed,
the mute enable flag is used to indicate whether each channel of the multi-channel signal is an unmuted channel.
Wherein, silence enable sign is used for instructing silence detection whether to turn on. For example, when the mute enable flag is a first value (e.g., 1), the mute detection function is turned on, and the mute flag of each channel is further detected. When the silence enable flag is a second value (e.g., 0), it indicates that the silence detection function is turned off. Alternatively, a mute-enable flag may be used to indicate whether each channel is an unmuted channel. For example, when the mute enable flag is a first value (e.g., 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is a second value (e.g., 0), it indicates that each channel is an unmuted channel.
In some embodiments of the application, the silence flag information includes: a mute enable flag and a mute flag;
step A3 of performing mute flag detection on each channel of the multi-channel signal to obtain mute flag information, including:
a31, performing mute mark detection on each channel of the multi-channel signal to obtain mute marks of each channel;
a32, determining a mute enable mark according to the mute mark of each channel.
The coding end may detect mute flags of the channels, where the mute flags of the channels are used to indicate whether the channels are mute frames. The mute flag of each channel is denoted as mutiflag [ ch ], where ch is the channel number, ch= … N-1, where N is the total number of channels of the input signal to be encoded, where the number of channels of the sound bed signal is M, the number of channels of the object channel is P, and the total number of channels n=m+p. Channel number of the sound bed signal. For example, the signal to be encoded is a mixed signal including a sound bed signal and an object signal, wherein the sound bed signal is a 5.1.4 channel signal, and the channel number m=10 of the sound bed signal; the number of the object signals is 4, and the channel number p=4 of the object signals; the total number of channels was 14. The channel numbers of the sound bed signals are from 0 to 9, and the channel numbers of the object signals are from 10 to 13. Mute flag mutehag [ ch ], ch=0 …, corresponds to the mute flag of each channel, and is used to indicate whether each channel is a mute channel. After determining the mute flag for each channel, a mute enable flag is determined from the mute flag for each channel.
In some embodiments of the application, the silence flag information includes: a mute flag; alternatively, the mute flag information includes: a mute enable flag and a mute flag;
and the mute mark is used for indicating whether each channel acted by the mute enable mark is a mute channel, and the mute channel is a channel which does not need to be encoded or a channel which needs to be encoded according to low bits.
For example, the channel numbers of the sound bed signals are from 0 to 9, and the channel numbers of the object signals are from 10 to 13. Mute flag mutehlag ch, ch=0 …, a mute flag corresponding to each channel, for indicating whether each channel to which the mute enable flag acts is a mute channel. A mute channel is a channel where the energy or decibel or loudness of a signal is below an auditory threshold, is a channel that does not need to be encoded or that does need to be encoded in lower bits. When the value of the mute flag is a first value (e.g., 1), it indicates that the channel is a mute channel; when the value of the mute flag is a second value (e.g., 0), it indicates that the channel is an unmuted channel. When the value of the mute flag is a first value (e.g., 1), the channel is not encoded or encoded in the lower bits.
In some embodiments of the present application, step A3 performs silence mark detection on each channel of the multi-channel signal, including:
B1, determining the signal energy of each channel of the current frame according to the input signals of each channel of the current frame of the multi-channel signal.
According to the input signals of each channel of the current frame, the signal energy of each channel of the current frame is determined, and the value of the frame length is not limited in the embodiment of the application.
B2, determining silence detection parameters of each channel of the current frame according to the signal energy of each channel of the current frame.
The silence detection parameters of each channel of the current frame are used for representing the energy value, the power value, the decibel value or the loudness value of each channel signal of the current frame.
And B3, determining a mute mark of each channel of the current frame according to the mute detection parameters of each channel of the current frame and a preset mute detection threshold.
And comparing the silence detection parameters of each channel of the current frame with a silence detection threshold, taking silence mark detection of the first channel of the current frame as an example, if the silence detection parameters of the first channel of the current frame are smaller than the silence detection threshold, the first channel of the current frame is a silence frame, namely the first channel of the current moment is a silence channel, and the silence mark muteFlag [1] of the first channel of the current frame is a first value (for example, 1). If the silence detection parameter of the first channel of the current frame is greater than or equal to the silence detection threshold, the first channel of the current frame is a non-silence frame, i.e. the first channel of the current time is a non-silence channel, and the silence flag muteFlag [1] of the first channel of the current frame is a second value (e.g. 0).
402. And carrying out multi-channel coding processing on the multi-channel signals to obtain transmission channel signals of all the transmission channels.
In the embodiment of the present application, the encoding device may perform multi-channel encoding processing on the multi-channel signal, and the multi-channel encoding process may be various, which is described in detail in the following embodiments, and through the encoding process, the transmission channel signal of each transmission channel may be obtained.
The specific implementation of multichannel quantization coding can be that signals after group-pair down-mixing are changed through a neural network to obtain potential characteristics; and quantizing the potential features and performing interval coding. A specific implementation of the multi-channel quantization coding may be to quantize the group-wise down-mixed signal based on vector quantization. The embodiment of the present application is not limited thereto.
In some embodiments of the present application, step 402 performs a multi-channel encoding process on a multi-channel signal to obtain a transmission channel signal of each transmission channel, including:
and C1, carrying out multi-channel signal screening on the multi-channel signals to obtain screened multi-channel signals.
For example, the encoding device completes the screening of the multi-channel signal, and the screened signal is a multi-channel signal participating in the pairing, for example, the screened channel does not include LFE channels, and the specific screening mode is not limited.
And C2, performing group pair processing on the screened multi-channel signals to obtain multi-channel group pair signals and multi-channel side information.
For example, the encoding device may filter the multi-channel signals, and the filtered multi-channel signals may be multi-channel signals participating in a group pair, and after the filtering of the multi-channel signals is completed, the multi-channel signals may be further grouped, for example, channels ch1 and ch2 form a channel group pair, so as to obtain a multi-channel group pair signal. The specific method of the grouping process is not limited to the present invention. The multi-channel side information includes at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index. Wherein, the inter-channel level difference parameter quantizes the codebook index, is used for pointing out the inter-channel level difference (Interaural Level Difference, ILD) parameter quantized codebook index of each channel in every sound channel of the multi-channel signal; a channel group pair number for representing a channel group pair number of a current frame of the multi-channel signal; the channel pair index is used for representing the index of the channel pair.
And C3, carrying out down-mixing processing on the multi-channel group pair signals according to the multi-channel side information so as to obtain transmission channel signals of all the transmission channels.
After the multi-channel group pair signal and the multi-channel side information are generated, the multi-channel group pair signal may be subjected to a down-mixing process using the multi-channel side information, and a specific down-mixing process is not described in detail, and a transmission channel signal of each transmission channel after the down-mixing of the multi-channel group pair may be obtained through the foregoing multi-channel group pair and the down-mixing, where the transmission channel may specifically refer to a channel after the multi-channel group pair and the down-mixing.
In some embodiments of the present application, before the mute flag information of the multi-channel signal is obtained in step 401, the encoding method of the multi-channel signal performed by the encoding end further includes:
d1, preprocessing the multichannel signal to obtain a preprocessed multichannel signal, wherein the preprocessing comprises at least one of the following steps: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping and frequency band extension coding;
in the implementation scenario of the foregoing implementation step D1, step 401 acquires mute flag information of the multi-channel signal, including:
and performing mute mark detection on the preprocessed multi-channel signal to obtain mute mark information.
The input signal of silence flag detection may be an original input multi-channel signal or a pre-processed multi-channel signal. Pretreatment may include, but is not limited to: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and the like. The multi-channel signal may be a time domain signal or a frequency domain signal. Through the preprocessing process, the coding efficiency of the multichannel signal can be improved.
In some embodiments of the present application, the encoding method of the multi-channel signal performed by the encoding end further includes:
e1, preprocessing the multichannel signal to obtain a preprocessed multichannel signal, wherein the preprocessing comprises at least one of the following steps: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping and frequency band extension coding;
and E2, correcting the mute label information according to the preprocessed multichannel signals.
Wherein, the encoding end can preprocess the multi-channel signal. Pretreatment may include, but is not limited to: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and the like. The multi-channel signal may be a time domain signal or a frequency domain signal. After preprocessing, the mute flag information in step 401 may also be modified according to the preprocessed multi-channel signal, for example, after frequency domain noise shaping, the signal energy of a certain channel of the multi-channel signal is changed, and the mute flag detection result of the channel may be adjusted.
403. Generating a code stream according to the transmission channel signals and the mute flag information of each transmission channel, wherein the code stream comprises: mute flag information and a multi-channel quantized encoding result of a transmission channel signal of each transmission channel.
The coding end generates a code stream, and the code stream comprises mute label information, so that the decoding end can acquire the mute label information, and decode the code stream based on the mute label information, thereby facilitating the decoding end to perform decoding processing, such as bit allocation, in a mode consistent with the coding end.
In some embodiments of the present application, step 403 generates a code stream according to the transmission channel signal and mute flag information of each transmission channel, including:
f1, adjusting an initial multi-channel processing mode according to mute label information to obtain an adjusted multi-channel processing mode;
and F2, encoding the multichannel signal according to the adjusted multichannel processing mode to obtain a code stream.
The coding end can adjust the initial multi-channel processing mode according to the mute mark information, and then code the multi-channel signal according to the adjusted multi-channel processing mode, so that the coding efficiency can be improved. For example, during the screening of the multi-channel signal, the channel with mute flag 1 does not participate in the group pair screening.
In some embodiments of the present application, step 403 generates a code stream according to the transmission channel signal and mute flag information of each transmission channel, including:
G1, carrying out bit allocation for each transmission channel according to the mute label information, the available bit number and the multi-channel side information to obtain a bit allocation result of each transmission channel;
and G2, coding the transmission channel signals of each transmission channel according to the bit allocation result of each channel to obtain a code stream.
The coding end can use the mute label information for bit allocation of the transmission channels, firstly, initial bit allocation is carried out for each transmission channel according to the available bit number and multi-channel side information, then bit allocation is carried out according to the mute label information, and a bit allocation result of each transmission channel is obtained; the transmission channel signal is encoded according to the bit allocation result of each transmission channel to obtain a code stream, which may be referred to as a coded code stream or a code stream of a multi-channel signal.
Further, in some embodiments of the present application, step G1 performs bit allocation for each transmission channel according to the silence flag information, the available bit number and the multi-channel side information, including:
and G11, according to the available bit number and the multi-channel side information, carrying out bit allocation for each transmission channel according to a bit allocation strategy corresponding to the mute mark information.
The coding end can allocate bits for each transmission channel according to the mute label information. The silence enable flag may be used to select different bit allocation policies. The specific content of the bit allocation policy is not limited, and is exemplified as follows: assuming that the mute enable flag includes a bed mute enable flag bedMuteEna and an object mute enable flag obj muteena, bit allocation is performed according to mute flag information, and first bit allocation may be performed according to total available bits and signal characteristics of each transmission channel. And then adjusting the bit allocation result according to the mute label information, and improving the transmission efficiency of the multi-channel signal through adjusting the bit allocation. For example, if the object mute enable flag obj muteena is 1, the bits allocated for the first time for the channel with mutehlag of 1 in the object signal are allocated to the bed signal or other object channels. If the bed silence enable flag bedMuteEna and the object silence enable flag are both 1, the bits allocated for the first time to the channel with the muteholder flag of 1 in the object channel may be reallocated to other object channels, and the bits allocated for the first time to the channel with the muteholder flag of 1 in the bed signal may be reallocated to other bed channels.
Further, in some embodiments of the present application, the multi-channel side information includes: the channel bit allocation proportion is set to be,
wherein the channel bit allocation proportion is used to indicate the bit allocation proportion between non-low frequency effect LFE channels in the multi-channel signal.
Wherein the low frequency effect LFE channel is an audio channel with bass sounds ranging from 3-120Hz, which channel is available for transmission to speakers designed specifically for low tones, the channel bit allocation ratio being used to indicate the bit allocation ratio of the non-LFE channel. For example, the channel bit allocation proportion occupies 6 bits. The embodiment of the application does not limit the bit number occupied by the bit allocation proportion of the sound channel.
For example, the channel bit allocation proportion may be a channel bit allocation proportion field in the multi-channel side information, denoted chbitrates, occupying 6 bits, for indicating the bit allocation proportion of all channels except the LFE channel in the multi-channel signal. The bit allocation proportion of each transmission channel can be indicated through the channel bit allocation proportion field, so that the number of bits obtained by each transmission channel is determined. Without limitation, the number of bits may be further converted into a number of bytes.
In some embodiments of the application, the multi-channel side information includes at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index;
Wherein, the inter-channel amplitude difference parameter quantizes the codebook index, is used for pointing out the inter-channel amplitude difference (Interaural Level Difference, ILD) parameter quantized codebook index of each sound channel;
a channel group pair number for representing a channel group pair number of a current frame of the multi-channel signal;
the channel pair index is used for representing the index of the channel pair.
The number of bits occupied by the inter-channel amplitude difference parameter quantization codebook index is not limited in the embodiment of the application. For example, the inter-channel amplitude difference parameter quantization codebook index occupies 5 bits. The inter-channel magnitude difference parameter quantization codebook index may be represented as mcIld [ ch1], mcIld [ ch2], occupying 5 bits, and the inter-channel magnitude difference ILD parameter quantized codebook index of each channel in the current channel pair is used to restore the magnitude of the decoded spectrum.
The embodiment of the application does not limit the number of bits occupied by the number of the sound channel groups. For example, the channel group pair number occupies 4 bits, and the channel group pair number is expressed as parircnt, occupies 4 bits, and is used to express the channel group pair number of the current frame.
The embodiment of the application does not limit the bit number occupied by the channel pair index. For example, the channel pair index is represented as channelPairIndex, channelPairIndex bits, which is related to the total number of channels, and is used to represent the index of the channel pair, and index values of two channels in the current channel pair, that is, ch1 and ch2, can be obtained by parsing.
In some embodiments of the present application, in addition to performing the foregoing steps, the encoding method performed by the encoding apparatus for encoding a multi-channel signal further includes:
the code stream is transmitted to a decoding device.
In the embodiment of the application, after the coding end obtains the transmission channel signals and the mute label information of each transmission channel, a code stream can be generated, the code stream carries the mute label information, and the coding end can send the code stream to the decoding end.
As can be seen from the foregoing examples of embodiments, silence flag detection is performed on a multi-channel signal to obtain silence flag information, where the silence flag information includes: a mute enable flag, and/or a mute flag; carrying out multichannel coding processing on the multichannel signals to obtain transmission channel signals of all transmission channels; generating a code stream according to the transmission channel signals of the transmission channels and the mute flag information, wherein the code stream comprises: and the mute label information and the multi-channel quantization coding result of the transmission channel signals of the transmission channels. And the subsequent coding processing is carried out according to the mute mark information, so that the coding efficiency can be improved.
The embodiment of the application also provides a decoding method of the multi-channel signal, which can be executed by a terminal device, for example, the terminal device can be a decoding device (hereinafter referred to as a decoding end or a decoder, for example, the decoding end can be an AI decoder) of the multi-channel signal. As shown in fig. 5, the method executed by the decoding end in the embodiment of the present application mainly includes:
501. The method comprises the steps of analyzing mute label information from a code stream of coding equipment, determining coding information of each transmission channel according to the mute label information, wherein the mute label information comprises the following steps: a mute enable flag, and/or a mute flag.
The decoding end adopts a processing mode which is opposite to that of the encoding end, firstly, a code stream is received from encoding equipment, and because the code stream carries mute mark information, the encoding information of each transmission channel is determined according to the mute mark information, and the mute mark information comprises: a mute enable flag, and/or a mute flag. For the description of the mute enable flag and the mute flag, the foregoing description of the embodiment of the encoding end will be omitted herein.
In some embodiments of the present application, step 501 parses silence flag information from a code stream of an encoding device, including:
h1, analyzing mute marks of all channels from the code stream; or alternatively, the process may be performed,
h2, analyzing a mute enable mark from the code stream, and analyzing the mute mark from the code stream if the mute enable mark is a first value; or alternatively, the process may be performed,
h3, analyzing the silence enabling mark of the sound bed and/or the silence enabling mark of the object and the silence mark of each sound channel from the code stream; or alternatively, the process may be performed,
H4, analyzing the sound bed mute enable mark and/or the object mute enable mark from the code stream; and according to the sound bed mute enable mark and/or the object mute enable mark, analyzing the mute mark of part of the sound channels from the code stream.
The decoding end analyzes the mute label information from the code stream of the encoding device, and the mute label information obtained by the decoding end corresponds to the encoding side according to the specific content difference of the mute label information generated by the encoding device. Specifically, in one mode, the mute flag is used to indicate whether each channel is a mute channel, where the mute channel is a channel that does not need to be encoded or a channel that needs to be encoded according to a low bit, and the decoding end can parse the mute flag of each channel from the code stream. In one approach, a mute-enable flag may also be used to indicate whether each channel is an unmuted channel. For example, when the mute enable flag is a first value (e.g., 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is a second value (e.g., 0), it indicates that each channel is an unmuted channel, the decoding end parses the mute enable flag from the code stream, and if the mute enable flag is a first value, parses the mute flag from the code stream. In one form, the mute-enable flag includes: the decoding end analyzes the sound bed mute enable mark and/or the object mute enable mark and the mute mark of each sound channel from the code stream. In one mode, a decoding end analyzes an audio bed silence enabling mark and/or an object silence enabling mark from a code stream; and according to the sound bed mute enable mark and/or the object mute enable mark, analyzing the mute mark of part of the sound channels from the code stream. The resulting mute flag of which partial channel is specific is not limited.
502. And decoding the encoded information of each transmission channel to obtain a decoded signal of each transmission channel.
After the decoding end obtains the coding information of each transmission channel from the code stream, the decoding end can decode the coding information of each transmission channel, and the decoding and inverse quantization process is opposite to the quantization coding process of the encoding end, so that the decoding signals of each transmission channel can be obtained.
In some embodiments of the present application, step 502 decodes the encoded information for each transmission channel, including:
i1, analyzing multi-channel side information from a code stream;
i2, bit allocation is carried out for each transmission channel according to the multi-channel side information and the mute flag information so as to obtain the coding bit number of each channel;
and I3, decoding the coding information of each transmission channel according to the coding bit number of each channel.
The code stream may further include multi-channel side information, the decoding end may perform bit allocation for each transmission channel according to the multi-channel side information and the mute flag information, so as to obtain the number of coding bits of each channel, the number of coding bits obtained by the decoding end is the same as the number of coding bits preset by the encoding end, and then the coding information of each transmission channel is decoded according to the number of coding bits of each transmission channel, so as to realize decoding of the transmission channel signals of each transmission channel.
Further, in some embodiments of the present application, the multi-channel side information includes: the channel bit allocation proportion field,
wherein the channel bit allocation proportion field is used to indicate the bit allocation proportion of the non-low frequency effects (Low Frequency Effects, LFE) channel in each channel.
Wherein the low frequency effect LFE channel is an audio channel with bass sounds ranging from 3-120Hz, which channel can be used for transmission to a loudspeaker designed specifically for low tones. For example, the channel bit allocation proportion field occupies 6 bits. The embodiment of the application does not limit the bit number occupied by the channel bit allocation proportion field.
For example, the channel bit allocation proportion field is denoted as chbitrates, and occupies 6 bits, and is used to indicate the bit allocation proportion of the non-LFE channel in each channel. The bit allocation proportion field can indicate the bit allocation proportion of each channel, so as to determine the obtained bit number of each channel. Without limitation, the number of bits may be further converted into a number of bytes.
In some embodiments of the application, the multi-channel side information includes at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index;
The inter-channel amplitude difference parameter quantization codebook index is used for indicating inter-channel amplitude difference ILD parameter quantization codebook indexes of each channel;
a channel group pair number for representing a channel group pair number of a current frame of the multi-channel signal;
the channel pair index is used for representing the index of the channel pair.
The number of bits occupied by the inter-channel amplitude difference parameter quantization codebook index is not limited in the embodiment of the application. For example, the inter-channel amplitude difference parameter quantization codebook index occupies 5 bits. The inter-channel magnitude difference parameter quantization codebook index may be represented as mcIld [ ch1], mcIld [ ch2], occupying 5 bits, and the inter-channel magnitude difference ILD parameter quantized codebook index of each channel in the current channel pair is used to restore the magnitude of the decoded spectrum.
The embodiment of the application does not limit the number of bits occupied by the number of the sound channel groups. For example, the channel group pair number occupies 4 bits, and the channel group pair number is expressed as parircnt, occupies 4 bits, and is used to express the channel group pair number of the current frame.
The embodiment of the application does not limit the bit number occupied by the channel pair index. For example, the channel pair index is represented as channelPairIndex, channelPairIndex bits, which is related to the total number of channels, and is used to represent the index of the channel pair, and index values of two channels in the current channel pair, that is, ch1 and ch2, can be obtained by parsing.
In some embodiments of the present application, step I2 performs bit allocation for each transmission channel according to the multi-channel side information and the mute flag information, including:
i21, determining a first residual bit number according to the available bit number and the safety bit number;
the value of the number of safety bits is not limited, for example, the number of safety bytes is expressed as safeBits, the number of safety bytes is 8 bits, and the first remaining number of bits can be obtained by subtracting the number of safety bits from the number of available bits.
I22, distributing a first residual bit number to each channel according to a channel bit distribution proportion field in the multi-channel side information, wherein the channel bit distribution proportion field is used for indicating the bit distribution proportion of each channel;
i23, when a second residual bit number exists after the first residual bit number is allocated to each channel, allocating the second residual bit number to each channel according to the channel bit allocation proportion field;
wherein the second remaining number of bits is obtained by subtracting the number of bits allocated to each channel from the first remaining number of bits.
I24, when a third residual bit number exists after the second residual bit number is allocated to each channel, allocating the third residual bit number to the channel with the largest allocated bit number when the first residual bit number is adopted for bit allocation;
Wherein the third remaining number of bits is obtained by subtracting the number of bits allocated to each channel from the second remaining number of bits.
And I25, when the number of bits allocated to the first channel in each channel exceeds the upper limit of the number of bits of a single channel, allocating the exceeding number of bits to other channels except the first channel in each channel.
The upper limit of the number of bits of a single channel is not limited. The first channel may be any one of the respective channels.
503. And carrying out multi-channel decoding processing on the decoding signals of each transmission channel to obtain multi-channel decoding output signals.
The decoding end obtains the decoding signals of each transmission channel through decoding, and then further decodes the decoding signals of each transmission channel, so as to obtain decoding output signals.
In some embodiments of the present application, after performing multi-channel decoding processing on the decoded signals of each transmission channel in step 503 to obtain multi-channel decoded output signals, the decoding method of the multi-channel signal performed by the decoding end further includes:
and J1, carrying out post-processing on the multichannel decoding output signals, wherein the post-processing comprises at least one of the following steps: band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time frequency transformation.
The process of post-processing the output signal is opposite to the process of preprocessing the encoding end, and the specific processing mode is not limited.
As can be seen from the foregoing illustration, in the embodiment of the present application, the decoding end may obtain silence flag information from the code stream of the encoding end, so that the decoding end may perform decoding processing, such as bit allocation, in a manner consistent with that of the encoding end.
In order to better understand and implement the above-mentioned schemes of the embodiments of the present application, the following specific description will exemplify the corresponding application scenario.
The product comprises a mobile phone terminal, a chip and a wireless network.
An encoding end of the embodiment is shown in fig. 6, and includes a silence flag detection unit, a multi-channel encoding processing unit, a multi-channel quantization encoding unit, and a code stream multiplexing interface.
The silence mark detection unit is mainly used for detecting silence mark information according to an input signal and determining the silence mark information. The mute flag information may include a mute enable flag and/or a mute flag.
The mute enable flag is denoted as HasSilFlag, and the mute enable flag may be a global mute enable flag or a partial mute enable flag, for example, an object mute enable flag acting only on an object signal in the multi-channel signal is denoted as obj muteena. As another example, the bed mute enable flag acting only on the object signal in the multi-channel signal is denoted as bedMuteEna.
The global mute enable mark is a mute enable mark acting on the multi-channel signal, and when the multi-channel signal only comprises the sound bed signal, the global mute enable mark is a mute enable mark acting on the sound bed signal; when the multichannel signal only comprises an object signal, the global mute enable mark is a mute enable mark acting on the object signal; when the multi-channel signal contains a sound bed signal and an object signal, the global silence enable flag is a silence enable flag acting on the sound bed signal and the object signal.
The partial mute enable flag is a mute enable flag applied to a partial channel in the multi-channel signal, and the partial channel is preset, for example: the partial silence enable flag is an object silence enable flag acting on the object signal, or the partial silence enable flag is a sound bed silence enable flag acting on the sound bed signal, or the partial silence enable flag is a silence enable flag acting on other channel signals of the multi-channel signal that do not include LFE channel signals. The partial mute enable flag is a mute enable flag for a channel signal acting on a participating group pair in a multi-channel signal. The specific manner of performing the group-pairing process on the multi-channel signal in the embodiment of the application is not limited.
The silence enable flag is used to indicate whether silence detection is on. For example, when the silence enable flag is a first value (e.g., 1), the silence detection function is turned on, and the silence flag of each channel is further detected. When the silence enable flag is a second value (e.g., 0), it indicates that the silence detection function is turned off.
The mute enable flag may also be used to indicate whether further transmission of the mute flag for each channel is required. For example, when the mute enable flag is a first value (e.g., 1), it indicates that further transmission of the mute flag for each channel is required. When the mute enable flag is a second value (e.g., 0), it indicates that no further transmission of the mute flag for each channel is required.
The mute enable flag may also be used to indicate whether each channel is an unmuted channel. For example, when the mute enable flag is a first value (e.g., 1), it indicates that further detection of the mute flag for each channel is required. When the mute enable flag is a second value (e.g., 0), it indicates that each channel is an unmuted channel.
The global silence enable flag acts on all channels and the partial silence enable flag acts on part of the channels. For example, the object silence enable flag is applied to a channel corresponding to an object signal in the multi-channel signal, and the sound bed silence enable flag is applied to a channel corresponding to a sound bed signal in the multi-channel signal.
The silence enabling flag may be controlled by an external input, may be preset according to encoder parameters such as a coding rate, a coding bandwidth, etc., and may be determined according to silence detection results of each channel.
The mute flag of each channel is used to indicate whether each channel is a mute frame. The silence flag for each channel is denoted silFlag [ i ], where ch is the channel number, ch= … N-1, where N is the total number of channels of the input signal to be encoded, where the number of channels of the sound bed signal is M, the number of channels of the object channel is P, and the total number of channels n=m+p. For example, the signal to be encoded is a mixed signal comprising the bed signal and the object signal, wherein: the sound bed signal is a 5.1.4-channel signal, and the channel number M of the sound bed signal is=10; the number of the object signals is 4, and the channel number p=4 of the object signals; the total number of channels was 14. The channel numbers of the sound bed signals are from 0 to 9, and the channel numbers of the object signals are from 10 to 13. Mute flag silFlag [ i ], ch=0 …, mute flag corresponding to each channel, is used to indicate whether each channel is a mute channel. A mute channel is a channel where the energy/db/loudness of the signal is below the hearing threshold, is a channel that does not need to be encoded or only needs to be encoded in the lower bits. When the value of the mute flag is a first value (e.g., 1), it indicates that the channel is a mute channel; when the value of the mute flag is a second value (e.g., 0), it indicates that the channel is an unmuted channel. When the value of the mute flag is a first value (e.g., 1), the channel is not encoded or encoded in the lower bits.
The input signal detected by the mute flag may be an original input signal or a signal after preprocessing. Pretreatment may include, but is not limited to: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and the like. The input signal may be a time domain signal or a frequency domain signal. Taking the input signal as a time domain signal of each channel in the multi-channel signal as an example, one method for detecting the mute flag of each channel may be:
and determining the energy of the signals of each channel of the current frame according to the input signals of each channel of the current frame.
Assuming a FRAME length frame_len, the energy (ch) of the ch channel of the current FRAME is:
wherein orig is ch The energy (ch) is the energy of the ch channel of the current frame, which is the input signal of the ch channel of the current frame.
And determining the silence detection parameters of each channel of the current frame according to the energy of the signals of each channel of the current frame.
The silence detection parameters of each channel of the current frame are used for representing the energy value, the power value, the decibel value or the loudness value of each channel signal of the current frame.
For example, the silence detection parameter of each channel of the current frame may be a log domain value of the energy of each channel signal of the current frame, such as log2 (energy (ch)) or log10 (energy (ch)). According to the energy of the current frame channel signals, calculating the silence detection parameters of the current frame channels, wherein the silence detection parameters of the current frame channels meet the following conditions:
energyDB[ch]=10*log10(energy[ch]/Bit_Depth/Bit_Depth);
Where, the energy db [ ch ] is a silence detection parameter of the ch channel of the current frame, the energy (ch) is energy of the ch channel of the current frame, bit_depth is a full offset value of Bit width, for example, the sampling Bit Depth is 16 bits, and the full offset value of Bit width is 216=65536.
And determining the mute mark of each channel of the current frame according to the mute detection parameters and the mute detection threshold value of each channel of the current frame.
Comparing the silence detection parameters of each channel of the current frame with silence detection threshold values respectively: if the silence detection parameter of the current frame ch channel is smaller than the silence detection threshold, the current frame ch channel is a silence frame, that is, the current time ch channel is a silence channel, and the silence flag silFlag [ i ] of the current frame ch channel is a first value (for example, 1). If the silence detection parameter of the current frame ch channel is greater than or equal to the silence detection threshold, the current frame ch channel is a non-silence frame, i.e. the current time ch channel is a non-silence channel, and the silence flag silFlag [ i ] of the current frame ch channel is a second value (e.g. 0).
According to the mute detection parameter and the mute detection threshold of the ch channel of the current frame, the pseudo code of the mute flag of the ch channel of the current frame is determined as follows:
silFlag[i]=0;
if(energyDB[ch]<g_MuteThrehold)
{silFlag[i]=1;}
the mute flag information may include a mute enable flag and/or a mute flag, with different mute flag information being exemplified as follows:
Mode one: the mute flag information is a mute flag silFlag [ i ] for each channel. And determining the mute flag (silFlag) of each channel, writing the mute flag (silFlag) of each channel into the code stream, and transmitting the mute flag (silFlag) of each channel to a decoding end.
Mode two: the mute flag information includes a mute enable flag HasSilFlag and a mute flag silFlag [ i ].
The silence enable flag HasSilFlag indicates whether the current frame turns on the silence detection function, and may also be used to indicate whether the current frame transmits the silence detection result of each channel.
Determining a mute enabling mark HasSilFlag, writing a code stream, and transmitting to a decoding end; it is determined whether to write a mute flag silFlag i to the bitstream according to the value of the mute enable flag.
When the mute enable flag HasSilFlag is 0, the mute flag silFlag [ i ] is not written into the code stream and transmitted to the decoding end.
When the mute enable flag HasSilFlag is 1, the mute flag silFlag [ i ] is written into the code stream and transmitted to the decoding end.
Mode three: the mute flag information includes a sound bed mute enable flag bedMuteEna, an object mute enable flag obj muteena, and mute flags silFlag [ i ] for each channel.
The bed silence enable flag bedMuteEna may be used to indicate whether the current frame turns on a silence detection function of a corresponding channel of the bed signal. Similarly, an object silence enable flag obj muteena may be used to indicate whether the current frame turns on the silence detection function of the corresponding channel of the object signal. For example:
When the sound bed mute enable flag bedMuteEna is 0, the object mute enable flag obj muteena is 1, and the mute flag values of the channels corresponding to the sound bed signals are all set to 0, namely, the non-mute channels. And the mute mark value of the channel corresponding to the object signal is a mute detection result.
When the sound bed mute enable flag bedMuteEna is 1, the object mute enable flag obj muteena is 0, and the mute flag values of the corresponding channels of the object signal are all set to 0, namely, the non-mute channels. And the mute mark value of the channel corresponding to the sound bed signal is a mute detection result.
When the sound bed mute enable flag bedMuteEna is 0, the object mute enable flag obj muteena is 0, and the mute flag value of each channel is set to 0, i.e. the non-mute channel.
When the sound bed mute enable flag bedMuteEna is 1, the object mute enable flag obj muteena is 1, and the mute flag of each channel is a mute detection result.
When the mute flag information includes the bed mute enable flag bedMuteEna, the object mute enable flag obj muteena, and the mute flag, the mute flag for each channel may be transmitted.
Mode four: the mute flag information includes a sound bed mute enable flag bedMuteEna, an object mute enable flag obj muteena, and a mute flag silFlag [ i ] of a partial channel.
The fourth mode is different from the third mode in that: only the mute flag of a partial channel is transmitted. For example, when the bed mute enable flag bedMuteEna is 0 and the object mute enable flag obj muteena is 1, only the mute flag of the channel corresponding to the object signal may be transmitted, and the mute flag of the channel corresponding to the bed signal may not be transmitted; when the sound bed mute enable flag bedMuteEna is 1 and the object mute enable flag obj muteena is 0, only the mute flag of the channel corresponding to the sound bed signal can be transmitted; when the sound bed mute enable flag bedMuteEna is 0 and the object mute enable flag obj muteena is 0, the mute flag of each channel is not required to be transmitted; when the sound bed mute enable flag bedMuteEna is 1 and the object mute enable flag obj muteena is 1, the mute flag of each channel is transmitted.
And a fifth method: the sound bed mute enable flag bedMuteEna, the object mute enable flag obj MuteEna may be replaced with HasSilFlag= { HasSilFlag (0), hasSilFlag (1) }, where HasSilFlag (0) and HasSilFlag (0) correspond to bedMuteEna and obj MuteEna, respectively. The bed mute enable flag bedMuteEna and the object mute enable flag obj muteena may also be represented by a 2-bit mute enable flag HasSilFlag. The embodiment of the application is not limited.
The method six: a mute flag for each channel is determined, and then a mute enable flag is determined based on the mute flag for each channel.
For example, the silence enable flag may be a global silence enable flag. If the mute flags of all channels are 0, the global mute enable flag is set to 0, and the global mute enable flag is only required to be written into the code stream and transmitted to the decoding side, so that the mute flags of all channels are not required to be transmitted. If at least one mute flag of each channel is 1, the global mute enable flag is set to 1, and the global mute enable flag is only required to be written into the code stream and transmitted to the decoding side, and the mute flag of each channel is not required to be transmitted.
For another example, the silence enable flag may be the bed silence enable flag bedMuteEna and the object silence enable flag obj muteena. Taking the bed silence enabling flag bedMuteEna as an example, if the silence flag of each channel corresponding to the bed signal is 0, the bed silence enabling flag is set to 0, and only the bed silence enabling flag is required to be written into the code stream and transmitted to the decoding side, and the silence flag of each channel corresponding to the bed signal is not required to be transmitted. If at least one mute flag of each channel corresponding to the voice bed signal is 1, the voice bed mute enable flag is set to 1, and the voice bed mute enable flag is only required to be written into the code stream and transmitted to the decoding side, so that the mute flag of each channel corresponding to the voice bed signal is not required to be transmitted. The object mute enable flag obj muteena may be similarly processed and will not be described here.
The embodiments of the present application are merely examples of some implementations, and specific implementations are possible and not limited to the specific implementations.
The multi-channel coding processing unit is used for completing screening, pairing, down-mixing processing and multi-channel side information generation of multi-channel signals and obtaining transmission channel signals after down-mixing of the multi-channel pairing.
Optionally, a preprocessing may be further included between the silence mark detection processing and the multi-channel encoding processing, for preprocessing the input signal to obtain a preprocessed input as the multi-channel encoding processing. Pretreatment may include, but is not limited to: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and the like. As shown in fig. 7, the multi-channel signal is filtered based on the multi-channel input signal or the preprocessed multi-channel signal, and the filtered multi-channel signal is obtained. And performing group pairing processing on the screened multi-channel signals to obtain multi-channel group pairing signals. And performing down-mixing processing (for example, mid-side information (MS) processing) on the multi-channel group pair signal to obtain a multi-channel group pair down-mixed signal to be encoded.
Optionally, the mute flag information may be modified during the preprocessing. For example, after shaping the frequency domain noise, the energy of a signal of a certain transmission channel is changed, and the silence detection result of the channel can be adjusted.
The multi-channel side information includes, but is not limited to: group pair number, group pair channel index list, group pair inter-channel intensity difference ILD coefficient list, group pair channel ILD size end list.
Alternatively, the initial multi-channel processing mode may be adjusted according to the mute flag information. For example, during the screening of the multi-channel signal, the channel with mute flag 1 does not participate in the group pair screening.
And the multichannel quantization coding unit is used for performing quantization coding on each transmission channel signal after the down-mixing on the multichannel group.
Multi-channel quantization coding includes bit allocation processing and coding.
Optionally, performing bit allocation according to the mute flag information, the available bit number and the multi-channel side information; and coding according to the bit allocation result of each channel to obtain a coded code stream.
The specific implementation of multichannel quantization coding can be that signals after group-pair down-mixing are changed through a neural network to obtain potential characteristics; and quantizing the potential features and performing interval coding. A specific implementation of the multi-channel quantization coding may be to quantize the group-wise down-mixed signal based on vector quantization. The embodiment of the present application is not limited thereto.
Alternatively, bit allocation may be performed according to silence flag information. For example, different bit allocation strategies are selected based on the silence enable flag.
Assuming that the mute enable flag includes a bed mute enable flag bedMuteEna and an object mute enable flag obj muteena, bit allocation is performed according to mute flag information, and first bit allocation may be performed according to total available bits and signal characteristics of each channel. And then adjusting the bit allocation result according to the mute label information. For example, if the object mute enable flag obj muteena is 1, the bits allocated for the first time to the channel with the mute flag 1 in the object signal are allocated to the sound bed signal or other object channels. If the silence enabling flag bedMuteEna and the object silence enabling flag are both 1, the bits allocated for the first time to the channel with silence identification 1 in the object channel can be reallocated to other object channels, and the bits allocated for the first time to the channel with silence identification 1 in the sound bed signal can be reallocated to other sound bed channels.
The code stream multiplexing interface multiplexes the encoded channels into a serial bit stream for convenient transmission in the channels or storage in digital media.
The decoding end of this embodiment includes, as shown in fig. 8, a code stream demultiplexing unit, a channel decoding dequantization unit, a multi-channel decoding processing unit, and a multi-channel post-processing unit.
And the code stream demultiplexing unit analyzes the mute flag information from the received code stream and determines the coding information of each channel.
And analyzing the mute flag information from the received code stream, wherein the analysis process is the reverse process of writing the mute flag information into the code stream by the coding end.
For example, if the encoding side adopts mode one, the decoding side: the mute flag silFlag i, ch= … N-1 of each channel is parsed from the code stream, where N is the number of channels of the multi-channel signal to be decoded.
Or, the encoding end adopts the second mode, the decoding end: firstly, analyzing a mute enabling mark HasSilFlag from a code stream; if the mute enable flag HasSilFlag is a first value (e.g., 1), the mute flag silFlag [ i ], ch= … N-1, where N is the number of channels of the multi-channel signal to be decoded, is parsed from the bitstream.
Or, the encoding end adopts the mode III, the decoding end: firstly, analyzing an audio bed mute enable mark bedMuteEna, an object mute enable mark objMuteEna and mute marks silFlag [ i ] of each channel from a code stream, wherein N is the channel number of a multichannel signal to be decoded.
Or, the encoding end adopts the fourth mode, the decoding end: firstly, analyzing an audio bed mute enable mark bedMuteEna and an object mute enable mark obj MuteEna from a code stream; and then analyzing the mute mark of the corresponding channel from the code stream according to the analysis sound bed mute enable mark bedMuteEna and the object mute enable mark obj muteena. For example: when the sound bed mute enable flag bedMuteEna is 0 and the object mute enable flag obj muteena is 1, the mute flag of the corresponding channel of the object signal is analyzed from the code stream; when the sound bed mute enable mark bedMuteEna is 1 and the object mute enable mark obj muteena is 0, the mute mark of the channel corresponding to the sound bed signal is analyzed from the code stream; when the sound bed mute enable flag bedMuteEna is 0 and the object mute enable flag obj muteena is 0, the mute flag does not need to be analyzed from the code stream; when the sound bed mute enable flag bedMuteEna is 1 and the object mute enable flag obj muteena is 1, the mute flag of each channel is analyzed from the code stream, and the number of the analyzed channels is the sum of the number of channels corresponding to the sound bed signal and the number of channels corresponding to the object signal.
Taking the following way as an example, the specific syntax of the decoding end for parsing silence flag information from the code stream is as follows:
and analyzing the multi-channel side information from the received code stream.
And carrying out bit allocation according to the multi-channel side information, and determining the coding bit number of each channel. Optionally, if the encoding end performs bit allocation according to the mute flag information, the decoding side also needs to perform bit allocation according to the mute flag information to determine the number of encoded bits of each channel.
And determining the coding information of each channel from the received code stream according to the coding bit number of each channel.
And the decoding unit performs inverse coding and inverse quantization on each coded channel to obtain a multi-channel group down-mixed decoding signal.
The inverse coding and inverse quantization are inverse processes of the encoding-side multi-channel quantization coding.
And the multichannel decoding processing unit is used for carrying out multichannel decoding processing on the down-mixed decoding signals by the multichannel group to obtain multichannel output signals.
The multi-channel decoding process is the inverse of the multi-channel encoding process. And reconstructing the multi-channel output signal according to the multi-channel group pair down-mixed decoding signal by utilizing the multi-channel side information.
As shown in fig. 9, if the encoding-side multi-channel encoding process further includes preprocessing, the decoding-side multi-channel decoding process further includes corresponding post-processing, for example: band spread decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time frequency transformation, etc., to obtain a final output signal.
As can be seen from the foregoing illustration, the coding efficiency can be improved by performing silence flag information detection on the multi-channel input signal, determining the silence flag information, and performing subsequent coding processing, such as bit allocation, according to the silence flag information.
The embodiment of the application provides a method for generating a mute identification bit stream according to the characteristics of an input signal. The coding end detects mute mark information of the multichannel input signal to determine mute mark information; transmitting mute label information to a decoding end; and performing bit allocation according to the mute label information, and encoding the multi-channel signal. The decoding end analyzes the mute label information from the code stream; and performing bit allocation according to the mute label information, and decoding the multi-channel signal.
In the technical scheme, each path of input signal is calculated to obtain the mute identification bit which is used for guiding bit allocation of coding and decoding. The input signal is judged whether it is a mute frame, if so, the channel is not encoded or a small number of bits are given to be encoded. And calculating the decibel value or loudness value of the signal at the input end, comparing the decibel value or loudness value with a set hearing threshold, setting a mute mark below the hearing threshold as 1, otherwise setting the mute mark as 0. The channel is not encoded or encoded in lower bits when the silence flag is 1; the data before quantization of the channel with mute bit 1 can be cleared 0; the mute identification is transmitted to the decoding end as side information to guide the bit demultiplexing of the decoding end, and the transmission grammar of the encoding end is as follows: using HasSilFlag to represent silence flag enable, hasSilFlag can be transmitted with 1 bit; the mute identification of each channel is further transmitted in the case of hassilflag=1, and the mute identification of each channel is not transmitted in the case of hassilflag=0. For example, 5.1.4 channels transmit a mute identification of 10 bits in the side information of multiple channels, and the sequence of each channel is 1bit and the sequence of the input channels is consistent; the other modules at the encoding end can modify the mute identification, change the mute identification from 1 to 0 and transmit the mute identification in the code stream.
The embodiment of the application has the following advantages: and detecting the mute label information of the multichannel input signal, determining the mute label information, and carrying out subsequent coding processing, such as bit allocation, according to the mute label information, wherein for a mute channel, coding can be carried out without coding or according to lower bits, so that the number of coding bits is saved, and the coding efficiency is improved.
And transmitting the mute label information to the decoding end, so that the decoding end can conveniently perform decoding processing, such as bit allocation, in a mode consistent with the encoding end.
In other embodiments of the present application, a hybrid coding improvement is described as follows:
a mixed mode codec supports the codec of the bed signal and the object signal. The specific implementation scheme is divided into three parts:
hybrid coded bit pre-allocation: and obtaining the preassigned bit number bedailblebytes of the sound bed signal and the preassigned bit number obj availlblebytes of the object signal according to the multi-channel side information bedaitsratio.
Hybrid coded bit allocation: the method comprises four steps, and sequentially comprises the following steps according to the processing sequence: silence frame bit allocation, non-silence frame bit allocation adaptation, non-silence frame bit allocation adaptation restoration.
Mute frame bit allocation: if a mute frame exists, bits are allocated to the mute frame sound channel according to a mute flag [ i ] of side information and a mixed allocation policy mixallocstatey, and the preassigned bit number bedAvailbleBytes of the sound bed signal and the preassigned total bit number obj availblebytes of the object signal are updated.
Non-silence frame bit allocation adaptation: the function of the channel parameter sequence mapping is to facilitate the non-mute frame bit allocation processing.
Non-silence frame bit allocation: bits are allocated according to the updated pre-allocation bit number bedovailblebytes of the sound bed signal, the updated pre-allocation bit number obj availlblebytes of the object signal and the channel bit allocation scaling factor chbitrates.
Non-silence frame bit allocation adaptation reduction: the channel parameter sequence is mapped reversely, so that the following steps of interval decoding, inverse quantization and neural network inverse transformation are convenient to use.
Hybrid coding upmix: and performing M/S up-mixing according to the two channels ch1 and ch2 of the grouped pair indicated by the channel pair index to obtain an up-mixed channel signal.
The multi-channel stereo side information syntax is shown in table 1 below as the decodmcsidebits () syntax.
/>
The semantics are described as follows, bedBitsRatio occupies 4 bits, the scale factor index representing the total number of bits of the sound bed signal, the value 0-15, the corresponding floating point scale is as follows:
1: 0.0625
2: 0.125
3: 0.1875
4: 0.25
5: 0.3125
6: 0.375
7: 0.4375
8: 0.5
9: 0.5625
10: 0.625
11: 0.6875
12: 0.75
13: 0.8125
14: 0.875
15: 0.9375。
mixAllocStrategy occupies 2 bits, representing the allocation strategy of the mixed signal of the sound bed signal and the object signal. The hybrid allocation policy may be predetermined or predefined in accordance with coding parameters including: coding rate, signal characteristic parameters. The encoding parameters are predetermined. The value range and meaning of the allocation policy are as follows:
and 0, giving the sound bed signals with redundant sound bed bits generated by a Mute mechanism (Mute mark), giving the object signals with redundant object bits, and giving the non-Mute sound beds with Mute sound beds.
1: the extra bed bits generated by the Mute mechanism are split into bed signals and the extra object bits are split into bed signals.
2: the redundant bed bits generated by the Mute mechanism are given to the object signal, and the redundant object bits are given to the object signal.
3: and (5) reserving.
HasSilFlag occupies 1 bit, 0 means that mute frame processing is turned off or there is no mute frame; 1 indicates that the mute frame processing is turned on and a mute frame is present.
silFlag i occupies 1 bit, and represents a silence frame flag of a corresponding channel, 0 represents a non-silence frame, and 1 represents a silence frame.
soundedtype occupies 1 bit, type of sound bed,0f is only the object signal or none (only obj), 1 is the bed signal or HOA signal or mc or HOA.
codingProfile occupies 3 bits, 0 mono, or a stereo signal or bed signal for mono/stereo/mc,1 bed and object mix for channel+obj mix,2for hoa.
The pair cnt occupies 4 bits and is used to represent the channel group pair number of the current frame.
The number of channel pair index bits relates to the total number of channels, see table 1 above. The index for representing the channel pair can be parsed to obtain index values of the two channels in the current channel pair, i.e., ch1 and ch2.
mcIld [ ch1], mcIld [ ch2] occupies 4 bits, and the inter-channel level difference parameter of each channel in the current channel pair is used to restore the level of the decoded spectrum.
scaleF lag [ ch1], scaleF lag [ ch2] occupies 1 bit, represents a scaling flag parameter of each channel in the current channel pair, and represents whether the current channel amplitude is reduced or enlarged.
chbitrates occupy 4 bits, representing the bit allocation proportion per channel.
The decoding process is as follows, with hybrid coded bit pre-allocation first.
The function of the mixed coding bit pre-allocation module is to calculate the remaining available bit number after other side information is removed according to the scale factor index parameter of the sound bed signal obtained by decoding in the bit stream accounting for the total bit number, so as to obtain the sound bed pre-allocation byte number and the object pre-allocation byte number, and provide the sound bed pre-allocation byte number and the object pre-allocation byte number for the subsequent module to use.
The number of available bytes remaining after the current frame is subtracted from other side information is marked as availableBytes, wherein the number of sound bed preassigned bytes is bedovailblebytes and the number of object preassigned bytes is obj availblebytes. The scale factor index parameter of the sound bed signal accounting for the total bit number is the correspondence of bedBitsRatioFloat, bedBitsRatio floating point scale factor corresponding to bedBitsRatio, bedBitsRatio and bedbitsratisfoat, see bedBitsRatio part in the foregoing semantics.
The formulas for calculating the sound bed preassigned byte number bedaibliblebytes and the object preassigned byte number obj availblebytes according to the available byte number availableBytes and the floating point scale factor bedaitsRatioFloat of the sound bed signal accounting for the total bit number are as follows:
bedAvailbleBytes=floor(availableBytes*bedBitsRatioFloat);
objAvailbleBytes=availableBytes–bedAvailbleBytes。
the process of mixed coding bit allocation is as follows, and the mixed coding bit allocation can jointly allocate the available bit number to each down-mix channel in the mixed coding multi-channel stereo according to the bit allocation parameter, the available byte number and other parameters in the bit stream, so as to complete the subsequent steps of section decoding, inverse quantization and neural network inverse transformation. The hybrid coded bit allocation comprises the following parts:
bit allocation of the mute frame channel. The bit allocation processing module of the mute frame channel is used for completing bit allocation of the mute frame of the mixed signal according to allocation strategy parameters mixalloctstrategy of the mixed signal of the sound bed signal and the object signal obtained by decoding in the bit stream and mute frame mark parameters mute enable mark HasSilFlag and mute mark silFlag obtained by decoding in the bit stream.
Step 1: hybrid coded silence frame bit allocation processing.
The bit allocation processing sub-module of the hybrid coding silence frame finishes bit allocation of the hybrid coding silence frame according to relevant parameters HasSilFlag and SilFlag of the silence frame mark obtained by decoding in the bit stream. The following cases and corresponding treatments exist:
case 1: when the HasSilFlag is analyzed to be 0, the current frame is not started to process the mode or the current frame is not provided with the mute frame, and the mixed coding mute frame bit allocation processing sub-module does not execute other operations.
Case 2: when the HasSilFlag is analyzed to be 1, the current frame is started to process the mute frame, and the mute frame exists. At this time, silFlag [ i ] of all channels is traversed, when silFlag [ i ] is 1, the number of bytes channel bytes [ i ] of the channels is set to be the minimum safe number of bytes safetyBytes, and the value of the minimum safe number of bytes safetyBytes is related to the requirement of the quantization and interval coding module on the number of input bytes, for example, 10 bytes can be set here.
The update object pre-allocates the byte count obj availbytes. Traversing the object channels with silFlag [ i ] of 1, and for each object channel with silFlag [ i ] of 1, performing the following operations:
objAvailbleBytes-=safetyBytes;
the bed pre-allocation byte number bedovailblebytes is updated. Traversing the soundbed channels for which silFlag [ i ] is 1, for each soundbed channel for which silFlag [ i ] is 1, performing the following operations:
bedAvailbleBytes-=safetyBytes。
Step 2: and a mute frame residual bit allocation strategy.
The function of the silence frame bit allocation strategy submodule is to decide whether to allocate the rest bit number generated by the silence frame to the sound bed signal or the object signal according to the allocation strategy parameter mixAllocStrategy of the mixed signal of the sound bed signal and the object signal obtained by decoding in the bit stream when the silence frame exists, and the specific allocation strategy is determined by the value of the mixAllocStrategy, and the value meaning of the mixAllocStrategy is detailed in the mixAllocStrategy part.
The embodiment of the application supports 2 different rest bit allocation strategies of the silence frames. First, pre-calculation is performed:
the object channel allocation average byte number obj AvgBytes is calculated according to the object pre-allocation byte number obj AvaillbleBytes and the object channel number obj Num, and the calculation formula is as follows:
objAvgBytes[i]=floor(objAvailbleBytes/objNum);
if there are remaining bytes after the division, the remaining bytes are split into a plurality of 1 bytes which are allocated secondarily from low to high according to the sequence number of the object signal, that is, when sum (obj avgcytes i) < obj availble bytes,
the same operation is done for the other object channels obj avgcytes [0] +=1, until the end of sum (obj avgcytes [ i ])= obj availble bytes.
Scheme 1: when mixallocsstrategy is 0, the object silence frame remaining bits obj sill leftbytes, whose initial value is 0, are defined, silFlag [ i ] corresponding to all object channels are traversed, and when silFlag [ i ] =1, the value of obj l leftbytes is updated, that is,
objSilLeftBytes+=objAvailbleBytes[i]–safetyBytes;0<=i<objNum;
Until all obj channels have been traversed.
Scheme 2: defining the object silence frame residual bits objSilLeftBytes with initial value of 0 when mixallocstrategy is 1, traversing silFlag [ i ] corresponding to all object channels, and updating the value of objSilLeftBytes when silFlag [ i ] =1, namely
objSilLeftBytes+=objAvailbleBytes[i]–safetyBytes;0<=i<objNum;
Until all obj channels have been traversed.
Updating the sound bed pre-allocation byte number bedovailblebytes and the object pre-allocation byte number obj availblebytes, for example, in the following manner:
bedAvailbleBytes+=objSilLeftBytes;
objAvailbleBytes-=objSilLeftBytes。
non-mute frame bit pre-allocation adaptation. The input parameters of bit allocation of the non-mute frame channels are mapped into continuous arrangement of the channels (the existence of the mute frame channels can cause the non-mute frame channels to be physically and discretely arranged), so that the bit allocation processing of the non-mute frame channels of a subsequent module is facilitated.
Bit allocation for non-muted frame channels. The bit allocation processing of the sound bed non-mute frame sound channel adopts a bit allocation universal module, and the function of the bit allocation universal module is to allocate the available bit number to each down-mix sound channel in the sound bed object multi-channel stereo according to parameters such as the updated pre-allocation byte number beda vailblebytes, the sound channel bit allocation proportion and the like.
The available byte count of the input is availableBytes. In the multi-channel stereo mode, LFE channels may exist, and in general, the LFE channels have less effective spectrum information, so that the multi-channel stereo mode does not need to participate in a bit allocation process, and a fixed bit number is allocated in advance. The number of pre-allocated bits of the LFE channel is related to the coding rate. The average code rate of the channel pair is cpeRate, which is the result of converting the total code rate into one channel pair. If the cpeRate is less than 64kb/s, the number of bytes allocated by the LFE channel is 10; if the cpeRate is less than 96kb/s, the number of bytes allocated by the LFE channel is 15; if cpeRate > = 96kb/s, the number of bytes allocated by the LFE channel is 20. If the LFE channel exists, the preassigned byte number of the LFE channel is deducted from the available byte number availableBytes, and the remaining byte number after deduction is distributed to other channels except the LFE channel.
The process of assigning available byte numbers availableBytes to the remaining channels is divided into four steps as follows:
in a first step, bits are allocated to the various channels according to chBitRatios.
The number of bytes per channel can be expressed as:
channelBytes[i]=availableBytes*chBitRatios[i]/(1<<4)。
where (1 < < 4) represents the maximum value range of the channel bit allocation ratio chbitrates.
And step two, if all bytes are not distributed in the step one, the rest byte numbers are distributed to each sound channel again according to the proportion expressed by chBitRatio [ i ].
And thirdly, if bits remain after the second step is finished, the remaining bits are distributed to the channel with the largest bytes distributed in the first step.
And a fourth step of distributing the exceeding part to the rest channels if the byte number distributed by some channels exceeds the upper limit of the byte number of a single channel.
The bit allocation processing of the object non-mute frame channel adopts a bit allocation universal module, and the function of the bit allocation universal module is to allocate the available bit number to each down-mix channel in the sound bed object multi-channel stereo according to the updated available byte number obj av ailble bytes, the channel bit allocation proportion and other parameters. The bit allocation processing process of the specific non-mute frame channel of the object is carried out with the bit allocation processing process of the non-mute frame channel of the sound bed signal.
And (5) carrying out adaptive restoration on the non-mute frame channel. The byte number parameter output by the bit allocation processing of the non-mute frame channel is inversely mapped into physical arrangement according to the rule (the existence of the mute frame channel can cause the non-mute frame channel to be physically and discretely arranged), so that the processing of the following module interval decoding, inverse quantization and neural network inverse transformation steps is facilitated.
Hybrid coding upmix. The two channels ch1 and ch2 of the grouped pair indicated by the channel pair index are up-mixed in a center/Side (M/S) manner consistent with the two-channel stereo mode M/S.
After M/S up-mixing, the modified discrete cosine transform (Modified Discrete Cosine Transform, MDCT) spectrum of the up-mixed channels needs to be subjected to inverse binaural sound intensity difference (Interaural Level Difference, ILD) processing to recover the amplitude differences of the channels, and the process of the inverse ILD processing is as follows:
wherein factor is the amplitude adjustment factor corresponding to the ILD parameter of the ith channel, (1 < < 4) is the maximum quantized value range of mcIld, and mdctSpectrum [ i ] represents the MDCT coefficient vector of the ith channel.
The technical effect of the embodiment of the application is that when the multi-channel signal is a mixed signal containing the sound bed signal and the object signal and the multi-channel signal contains the mute frame, different distribution strategies mixallocstaty for mixing the mixed signal containing the sound bed signal and the object signal are adopted, the bit number saved for the mute frame is distributed to other non-mute frames, and the coding efficiency is improved.
The improvement of the embodiment of the application is that the preassigned bit number bedailblebytes of the sound bed and the preassigned total bit number obj avilblebytes of the object are determined; determining whether a silence frame is included in the sound bed and the object; if a mute frame exists, bits are allocated to the mute frame channels according to the side information silFlag [ i ] and mixallocstategy, and the preassigned bit number bedovailblebytes of the sound bed and the preassigned total bit number obj availblebytes of the object are updated.
The embodiment of the application provides a bit allocation mode bit stream method under a sound bed object mixed mode. Analyzing an allocation strategy mixallocstaty of a mixed signal including an acoustic bed signal and a target signal from a code stream; the mute frame channel allocation bits are performed according to an allocation strategy of a mixed signal including a sound bed signal and an object signal.
Determining the preassigned bit number bedAvailbleBytes of the sound bed and the preassigned total bit number obj availblebytes of the object; determining whether a silence frame is included in the sound bed and the object; if a mute frame exists, bits are allocated to the mute frame channels according to the side information silFlag [ i ] and mixallocstategy, and the preassigned bit number bedovailblebytes of the sound bed and the preassigned total bit number obj availblebytes of the object are updated.
Parsing silence flag information (including HasSilFlag and silFlag [ i ]) from the code stream; and determining whether a mute frame exists according to the mute flag information.
Bits are allocated to the mute frame channels according to the side information silFlag i and mixAllocStrategy, and the pre-allocation bit number bedAvailbleBytes of the sound bed and the pre-allocation total bit number obj availblebytes of the object are updated.
Whether to allocate the remaining number of bits generated by the mute frame to the sound bed signal or the object signal is determined according to the allocation policy parameter mixallocstaty of the obtained mixed signal including the sound bed signal and the object signal.
mixAllocStrategy2 bits represent an allocation strategy of a mixed signal including a sound bed signal and a subject signal. The value range and meaning are as follows:
and 0, the redundant bits generated by the Mute mechanism belong to the sound bed signals, the redundant bits are distributed to other sound bed signals, the redundant bits belong to object signals, and the redundant bits are distributed to other object signals.
1: the extra bits generated by the Mute mechanism belong to the bed signal, the extra bits are allocated to the other bed signal, the extra bits belong to the object signal, and the extra bits are allocated to the other bed signal.
2: the extra bits generated by the Mute mechanism belong to the bed signal, the extra bits are allocated to other object signals, the extra bits belong to the object signals, and the extra bits are allocated to other object signals.
3: and (5) reserving.
2 specific residual bit allocation methods corresponding to different residual bit allocation strategies of the silence frames. When the multi-channel signal is a mixed signal containing the sound bed signal and the object signal, the object signal is used as the sound bed signal to carry out bit distribution together according to a unified bit distribution strategy, the sound bed signal and the object signal are mutually influenced, and the quality is poor.
The embodiment of the application provides a bit distribution bit stream method in a sound bed object mixing mode, which comprises the following steps:
when the multi-channel signal is a mixed signal containing an acoustic bed signal and an object signal, obtaining a bit allocation scale factor according to code stream decoding, wherein the bit allocation scale factor is used for representing the relation between the encoding bit number and the total available bit number of the acoustic bed signal and/or the object channel signal;
determining the pre-allocation bit number bedovailblebytes of the sound bed signal and the pre-allocation bit number obj availlblebytes of the object signal according to the bit allocation scale factors;
determining the bit allocation number of each channel according to the pre-allocation bit number bedovailblebytes of the sound bed signal and the pre-allocation bit number obj availblebytes of the object signal;
decoding is carried out according to the bit distribution number of each channel and the code stream, and a decoded multi-channel signal is obtained.
The bit allocation scale factor is a scale factor (bedbitsrationflow in the embodiment) of the number of encoded bits of the sound bed signal to the total number of available bits, or a scale factor of the number of encoded bits of the object signal to the total number of available bits, or a ratio of the number of encoded bits of the sound bed signal to the number of encoded bits of the object signal, or a ratio of the number of encoded bits of the object signal to the number of encoded bits of the sound bed signal.
The bit allocation scale factor is the scale factor of the number of coding bits of the sound bed signal to the total available bits, and the specific method for determining the bit allocation scale factor is as follows: the bit allocation scale factor index (bedBitsRatio in the embodiment) is parsed from the code stream, and the bit allocation scale factor (bedBitsRatio flow in the embodiment) is determined from the bit allocation scale factor index.
The bit allocation scale factor index may be a coded index obtained by uniformly quantizing and coding the bit allocation scale factor, or may be a coded index obtained by non-uniformly quantizing and coding the bit allocation scale factor.
The bit allocation scale factor index and the bit allocation scale factor may be linear or non-linear.
The formulas for calculating the sound bed preassigned byte number bedaiblebytes and the object preassigned byte number obj availblebytes according to the available byte number availableBytes and the floating point scale factor bedaitsRatioFloat of the sound bed accounting for the total bit number are as follows:
bedAvailbleBytes=floor(availableBytes*bedBitsRatioFloat);
objAvailbleBytes=availableBytes–bedAvailbleBytes。
And analyzing mute flag information (comprising HasSilFlag and silFlag [ i ]) from the code stream, carrying out bit allocation according to the preassigned bit number bedAvailbleBytes of the sound bed signal, the preassigned bit number obj AvailbleBytes of the object signal and the mute flag information, and determining the bit allocation number of each channel.
A step of mixed code bit allocation: determining whether a mute frame exists according to the mute flag information; if a mute frame exists, bits are allocated to the mute frame channel according to side information silFlag [ i ] (and mixallocStrategy), and the preassigned bit number bedovailbleBytes of the sound bed signal and the preassigned total bit number obj availbleBytes of the object signal are updated; according to the non-silence frame bit allocation principle, bits are allocated to the non-silence frame channel (comprising three steps of non-silence frame bit allocation adaptation, non-silence frame bit allocation and non-silence frame bit allocation adaptation and restoration).
The encoding end determines a bit allocation scale factor;
carrying out quantization coding on the factor to obtain an index of a bit allocation scale factor;
the index is written into the code stream.
The bit allocation scale factor index and the bit allocation scale factor may be linear or non-linear.
The scale factor is predefined in accordance with the coding parameters.
The coding parameters include: coding rate, signal characteristic parameters. The encoding parameters are predetermined.
The coding parameters are adaptively determined according to characteristics of the signal of each frame, for example, the type of signal.
The coding end determines a mixed allocation strategy, and the mixed allocation strategy is carried in the code stream. The encoding end sends the encoded data to the decoding end.
When the silence enable flag contains an object silence enable flag and a sound bed silence enable flag, the allocation strategy of the sound bed object mix signal may also contain other modes, such as:
mode 1: the object silence enabling mark is 1, and redundant bits generated by the existence of a silence channel in the object signal are distributed to other non-silence channels in the object channel;
mode 2: the object silence enabling mark is 1, and redundant bits generated by the existence of a silence channel in the object signal are distributed to the channel where the sound bed signal is located;
mode 3: the sound bed mute enable mark is 1, and redundant bits generated by the existence of a mute channel in the sound bed signal are distributed to other non-mute channels in the sound bed channel;
mode 4: the sound bed mute enabling mark is 1, and redundant bits generated by the existence of a mute channel in the sound bed signal are distributed to the channel where the object signal is located;
Mode 5: the sound bed mute enable mark and the object mute enable mark are 1, and redundant bits generated by the existence of a mute channel in the object signal are distributed to other non-mute channels in the object channel;
mode 6: the voice bed silence enable flag and the object silence enable flag are both 1, and redundant bits generated by the existence of the silence channel in the object signal are distributed to other non-silence channels in the voice bed channel.
In other embodiments of the present application, the mixed signal coding improvement scheme is as follows:
the mixed signal coding mode in the AVS3P3 standard supports encoding and decoding of the bed signal and the object signal. A large number of mute frames exist in the sound bed signal and the object signal in actual application, and the coding efficiency of the mixed signal can be effectively improved by reasonably processing the mute frames. Therefore, the proposal provides a high-efficiency coding method for the mixed signal, and the coding quality of the mixed signal is improved through reasonable bit allocation of the mute frames and the non-mute frames in the sound bed signal and the object signal. Meanwhile, the bit allocation strategy of the mixed signal is put into the encoding end for implementation, and the decoding end does not distinguish between the sound bed and the object in the bit allocation link. The specific implementation scheme comprises the following steps:
the mute enable flag is denoted as HasSilFlag, and the mute flag of the i-th channel of each channel is denoted as silFlag [ i ], and the mute enable flag is a mute enable flag applied to other channel signals of the multi-channel signal that do not include the LFE channel signal. For example, hasSilFlag, which indicates whether or not there is a mute frame in other channels than the LFE channel in each channel. The SilFlag corresponding to each channel is used for indicating whether the channel is a mute frame or not except the LFE channel.
chbitrequest [ i ] changes from a non-LFE channel to a non-LFE non-mute channel; the bit number of chbitri is changed from 4 to 6;
ILD side information is changed from 4-bit inter-channel amplitude difference parameter and 1-bit scaling flag parameter to 5-bit scaling factor codebook index.
The multi-channel stereo decoding syntax is Avs3 mccec () syntax as shown in table 2 below.
The multi-channel stereo side information syntax is the following table 3, which is the decodmcsidebits () syntax.
/>
The semantic McBitsAllocationHasSiL () allocates for multi-channel stereo bits.
The coupleChNum is the number of channels of all other channels of the multi-channel signal that do not contain LFE channels.
HasSilFlag occupies 1 bit, and indicates whether or not there is a mute frame in each channel of the current frame of the audio signal, 0 indicates that there is no mute frame, and 1 indicates that there is a mute frame.
silFlag [ i ] occupies 1 bit, 0 indicates that the i-th channel is a non-mute frame, 1 indicates that the i-th channel is a mute frame
mcIld [ ch1], mcIld [ ch2] occupy 5 bits, and the codebook index quantized by inter-channel amplitude difference ILD parameters of each channel in the current channel pair is used for recovering the amplitude of the decoded spectrum.
The pair cnt occupies 4 bits and is used to represent the channel group pair number of the current frame.
The channel pair index is represented as channelPairIndex, channelPairIndex bits and relates to the total number of channels, see note 1 in the table above. The index for representing the channel pair can be parsed to obtain index values of the two channels in the current channel pair, i.e., ch1 and ch2.
chbitrates occupy 6 bits, representing the bit allocation proportion per channel.
The decoding process is as follows:
mixed signal bit allocation. And the mixed signal bit allocation allocates the residual available bit number after other side information is removed to each down-mixed channel in the multi-channel stereo according to the mute channel mark and the bit allocation proportion parameter obtained by decoding in the bit stream, so that the following steps of section decoding, inverse quantization and neural network inverse transformation are completed.
The number of available bytes remaining after the current frame is subtracted from other side information is noted as availableBytes.
The multi-channel stereo mode may have a mute channel, and the mute channel does not need to participate in the bit allocation process of the multi-channel stereo mode, and a fixed byte number is allocated in advance, so that the byte number is 8. If the mute channel exists, the pre-allocated byte number of the mute channel is deducted from the available byte number availableBytes, and the remaining byte number after deduction is allocated to other channels except the mute channel.
The process of assigning available byte numbers availableBytes to the remaining channels is divided into five steps as follows:
first, the safe byte number is pre-allocated for each channel, and the safe byte number is 8. The safe byte number is deducted from the available byte number availableBytes, and the remaining byte number availableBytes after deduction are allocated in the subsequent steps.
Second, bits are allocated to the channels according to chbitrates, and the number of bytes of each channel can be expressed as:
channelBytes[i]=availableBytes*chBitRatios[i]/(1<<6)。
where (1 < < 6) represents the maximum value range of the channel bit allocation ratio chbitrates.
And thirdly, if all bytes are not distributed in the second step, the rest byte numbers are distributed to each sound channel again according to the proportion expressed by chBitRatio [ i ].
And fourthly, if bits remain after the third step is finished, the remaining bits are allocated to the channel with the largest allocation bytes in the step 1.
Fifth, if the number of bytes allocated by some channels exceeds the upper limit of the number of bytes of a single channel, then the exceeding part is allocated to the rest channels.
Next, a process of up-mixing is described, in which the two channels ch1 and ch2 of the grouped pair indicated by the channel pair index are M/S up-mixed in a manner consistent with the two-channel stereo mode M/S up-mixed. After M/S up-mixing, the MDCT spectrum of the up-mixed sound channel needs to be processed by inverse ILD to recover the amplitude difference of the sound channel, and the pseudo code of the inverse ILD processing is as follows:
factor=mcIldCodebook[mcIld[i]],
mdctSpectrum[i]=factor*mdctSpectrum[i]。
Wherein, the factor is the amplitude adjustment factor corresponding to the ILD parameter of the ith channel, the quantization codebook with mcIldcodebook as the ILD parameter is shown in the following table 4, mcIld [ i ] represents the codebook index corresponding to the ILD parameter of the ith channel, and mdctSpectrum [ i ] represents the MDCT coefficient vector of the ith channel. Wherein, table 4 below is an mcILD code table:
/>
it should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In order to facilitate better implementation of the above-described aspects of embodiments of the present application, the following provides related devices for implementing the above-described aspects.
Referring to fig. 10, an encoding apparatus 1000 according to an embodiment of the present application may include: a mute flag information acquisition module 1001, a multi-channel encoding module 1002, and a code stream generation module 1003, wherein,
A mute flag information acquisition module, configured to acquire mute flag information of a multi-channel signal, where the mute flag information includes: a mute enable flag, and/or a mute flag;
the multichannel coding module is used for carrying out multichannel coding processing on the multichannel signals so as to obtain transmission channel signals of all transmission channels;
a code stream generating module, configured to generate a code stream according to the transmission channel signals of the transmission channels and the silence flag information, where the code stream includes: and the mute label information and the multi-channel coding result of the transmission channel signal.
Referring to fig. 11, a decoding apparatus 1100 according to an embodiment of the present application may include: a parsing module 1101, and a processing module 1102, wherein,
the analysis module is used for analyzing the mute label information from the code stream of the encoding equipment and determining the encoding information of each transmission channel according to the mute label information, wherein the mute label information comprises: a mute enable flag, and/or a mute flag;
the processing module is used for decoding the coding information of each transmission channel to obtain a decoding signal of each transmission channel;
the processing module is further configured to perform multi-channel decoding processing on the decoded signals of the transmission channels, so as to obtain multi-channel decoded output signals.
It should be noted that, because the content of information interaction and execution process between the modules/units of the above-mentioned device is based on the same concept as the method embodiment of the present application, the technical effects brought by the content are the same as the method embodiment of the present application, and the specific content can be referred to the description in the foregoing illustrated method embodiment of the present application, which is not repeated herein.
The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes part or all of the steps described in the embodiment of the method.
Next, another encoding apparatus provided in an embodiment of the present application is described, referring to fig. 12, an encoding apparatus 1200 includes:
a receiver 1201, a transmitter 1202, a processor 1203 and a memory 1204 (where the number of processors 1203 in the encoding apparatus 1200 may be one or more, one processor being exemplified in fig. 12). In some embodiments of the application, the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or other means, where a bus connection is illustrated in FIG. 12.
The memory 1204 may include read only memory and random access memory, and provides instructions and data to the processor 1203. A portion of the memory 1204 may also include non-volatile random access memory (non-volatile random access memory, NVRAM). The memory 1204 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various underlying services and handling hardware-based tasks.
The processor 1203 controls the operation of the encoding apparatus, and the processor 1203 may also be referred to as a central processing unit (central processing unit, CPU). In a specific application, the individual components of the coding device are coupled together by a bus system, which may comprise, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.
The method disclosed in the above embodiment of the present application may be applied to the processor 1203 or implemented by the processor 1203. The processor 1203 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1203. The processor 1203 described above may be a general purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204 and performs the steps of the above method in combination with its hardware.
The receiver 1201 may be configured to receive input digital or character information and to generate signal inputs related to the relevant settings and function control of the encoding device, the transmitter 1202 may include a display device such as a display screen, and the transmitter 1202 may be configured to output the digital or character information via an external interface.
In an embodiment of the present application, the processor 1203 is configured to perform the method performed by the encoding apparatus as shown in fig. 4, 6, and 7 in the foregoing embodiment.
Next, another decoding apparatus provided in an embodiment of the present application, referring to fig. 13, a decoding apparatus 1300 includes:
a receiver 1301, a transmitter 1302, a processor 1303 and a memory 1304 (where the number of processors 1303 in the decoding apparatus 1300 may be one or more, one processor is exemplified in fig. 13). In some embodiments of the application, the receiver 1301, transmitter 1302, processor 1303 and memory 1304 may be connected by a bus or other means, where a bus connection is illustrated in FIG. 13.
Memory 1304 may include read only memory and random access memory and provides instructions and data to processor 1303. A portion of the memory 1304 may also include NVRAM. The memory 1304 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for performing various operations. The operating system may include various system programs for implementing various underlying services and handling hardware-based tasks.
The processor 1303 controls the operation of the decoding apparatus, and the processor 1303 may also be referred to as a CPU. In a specific application, the individual components of the decoding device are coupled together by a bus system, which may comprise, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.
The method disclosed in the above embodiment of the present application may be applied to the processor 1303 or implemented by the processor 1303. The processor 1303 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1303. The processor 1303 described above may be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1304, and the processor 1303 reads information in the memory 1304, and performs the steps of the method in combination with hardware.
In an embodiment of the present application, the processor 1303 is configured to perform the methods performed by the decoding apparatus shown in fig. 5, 8, and 9 in the foregoing embodiments.
In another possible design, when the encoding device or the decoding device is a chip within a terminal, the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause a chip within the terminal to perform the audio encoding method of any one of the above first aspect, or the audio decoding method of any one of the second aspect. Alternatively, the storage unit is a storage unit in the chip, such as a register, a cache, or the like, and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM), or the like.
The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method of the first or second aspect.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims (29)

1. A method of encoding a multi-channel signal, comprising:
acquiring mute flag information of a multi-channel signal, the mute flag information including: a mute enable flag, and/or a mute flag;
carrying out multichannel coding processing on the multichannel signals to obtain transmission channel signals of all transmission channels;
generating a code stream according to the transmission channel signals of the transmission channels and the mute flag information, wherein the code stream comprises: and the mute label information and the multi-channel coding result of the transmission channel signal.
2. The method according to claim 1, wherein the multi-channel signal comprises: an acoustic bed signal, and/or a subject signal;
the mute flag information includes: the mute enable flag; the mute enable flag includes: a global mute enable flag, or a partial mute enable flag, wherein,
the global silence enable flag is a silence enable flag acting on the multi-channel signal; or alternatively, the process may be performed,
the partial mute enable flag is a mute enable flag acting on a partial channel of the multi-channel signal.
3. The method of claim 2, wherein when the mute enable flag is the partial mute enable flag,
The partial mute enable flag is an object mute enable flag acting on the object signal, or the partial mute enable flag is an acoustic-bed mute enable flag acting on the acoustic-bed signal, or the partial mute enable flag is a mute enable flag acting on other channel signals of the multi-channel signal that do not contain LFE channel signals with non-low frequency effects, or the partial mute enable flag is a mute enable flag acting on channel signals of a participating group pair in the multi-channel signal.
4. A method according to any of claims 1 to 3, characterized in that the multi-channel signal comprises: an acoustic bed signal, and a subject signal;
the mute flag information includes: the mute enable flag; the mute enable flag includes: a sound bed silence enable flag, and an object silence enable flag,
the silence enabling flag occupies a first bit and a second bit, the first bit is used for bearing the value of the silence enabling flag of the sound bed, and the second bit is used for bearing the value of the silence enabling flag of the object.
5. The method according to any one of claims 1 to 4, wherein the mute flag information includes: the mute enable flag;
The mute enable mark is used for indicating whether a mute mark detection function is started or not; or alternatively, the process may be performed,
the mute enable flag is used for indicating whether each channel of the multi-channel signal needs to be transmitted; or alternatively, the process may be performed,
the mute enable flag is used to indicate whether each channel of the multi-channel signal is an unmuted channel.
6. The method according to any one of claims 1 to 5, wherein the acquiring mute flag information of the multichannel signal includes:
acquiring the mute label information according to a control signaling input into coding equipment; or alternatively, the process may be performed,
acquiring the mute label information according to the coding parameters of the coding equipment; or alternatively, the process may be performed,
and performing mute mark detection on each channel of the multi-channel signal to obtain the mute mark information.
7. The method of claim 6, wherein the mute flag information comprises: the mute enable flag and the mute flag;
the mute label detection is performed on each channel of the multi-channel signal to obtain the mute label information, including:
performing mute mark detection on each channel of the multi-channel signal to obtain mute marks of each channel;
And determining the mute enabling mark according to the mute mark of each channel.
8. The method of claim 1, wherein the mute flag information comprises: the mute flag; alternatively, the mute flag information includes: the mute enable flag and the mute flag;
the mute flag is used for indicating whether each channel acted by the mute enable flag is a mute channel, and the mute channel is a channel which does not need to be encoded or a channel which needs to be encoded according to low bits.
9. The method according to any one of claims 1 to 8, wherein before the acquiring mute flag information of the multichannel signal, the method further comprises:
preprocessing the multichannel signal to obtain a preprocessed multichannel signal, wherein the preprocessing comprises at least one of the following steps: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping and frequency band extension coding;
the acquiring mute flag information of the multi-channel signal includes:
and carrying out mute mark detection on the preprocessed multichannel signals to obtain mute mark information.
10. The method according to any one of claims 1 to 8, further comprising:
Preprocessing the multichannel signal to obtain a preprocessed multichannel signal, wherein the preprocessing comprises at least one of the following steps: transient detection, window judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping and frequency band extension coding;
and correcting the mute label information according to the preprocessed multichannel signals.
11. The method according to any one of claims 1 to 10, wherein the generating a code stream from the transmission channel signal of each transmission channel and the mute flag information, comprises:
adjusting an initial multi-channel processing mode according to the mute label information to obtain an adjusted multi-channel processing mode;
and encoding the transmission channel signals of each transmission channel according to the adjusted multichannel processing mode to obtain the code stream.
12. The method according to any one of claims 1 to 10, wherein the generating a code stream from the transmission channel signal of each transmission channel and the mute flag information, comprises:
bit allocation is carried out for each transmission channel according to the mute label information, the available bit number and the multi-channel side information, and a bit allocation result of each transmission channel is obtained;
And encoding the transmission channel signals of the transmission channels according to the bit allocation result of the transmission channels so as to obtain the code stream.
13. The method of claim 12, wherein said allocating bits for each transmission channel based on the silence flag information, the number of available bits, and multi-channel side information comprises:
and according to the available bit number and the multi-channel side information, carrying out bit allocation for each transmission channel according to the bit allocation strategy corresponding to the mute mark information.
14. The method of claim 12, wherein the multi-channel side information comprises: the channel bit allocation proportion is set to be,
wherein the channel bit allocation proportion is used for indicating the bit allocation proportion between non-low frequency effect LFE channels in the multi-channel signal.
15. The method according to claim 6 or 7, wherein said mute flag detection for each channel of said multi-channel signal comprises:
determining signal energy of each channel of the current frame according to the signal of each channel of the current frame of the multi-channel signal;
determining silence detection parameters of each channel of the current frame according to the signal energy of each channel of the current frame;
And determining a mute mark of each channel of the current frame according to the mute detection parameters of each channel of the current frame and a preset mute detection threshold.
16. The method according to any one of claims 1 to 15, wherein said multi-channel encoding the multi-channel signal to obtain a transmission channel signal for each transmission channel, comprises:
carrying out multichannel signal screening on the multichannel signals to obtain screened multichannel signals;
performing group pair processing on the screened multi-channel signals to obtain multi-channel group pair signals and multi-channel side information;
and carrying out down-mixing processing on the multi-channel group pair signals according to the multi-channel side information so as to obtain transmission channel signals of all the transmission channels.
17. The method of claim 16, wherein the multi-channel side information comprises at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index;
the inter-channel amplitude difference parameter quantization codebook index is used for indicating inter-channel amplitude difference ILD parameter quantization codebook indexes of each channel in the multi-channel signal;
The number of channel group pairs is used for representing the number of channel group pairs of the current frame of the multi-channel signal;
the channel pair index is used for representing the index of the channel pair.
18. A method of decoding a multi-channel signal, comprising:
analyzing mute label information from a code stream of coding equipment, and determining coding information of each transmission channel according to the mute label information, wherein the mute label information comprises: a mute enable flag, and/or a mute flag;
decoding the encoded information of each transmission channel to obtain a decoded signal of each transmission channel;
and carrying out multi-channel decoding processing on the decoding signals of the transmission channels to obtain multi-channel decoding output signals.
19. The method of claim 18, wherein parsing silence flag information from a code stream of an encoding device comprises:
analyzing mute marks of all channels from the code stream; or alternatively, the process may be performed,
analyzing the mute enabling mark from the code stream, and analyzing the mute mark from the code stream if the mute enabling mark is a first value; or alternatively, the process may be performed,
analyzing a sound bed mute enabling mark and/or an object mute enabling mark and a mute mark of each sound channel from the code stream; or alternatively, the process may be performed,
Analyzing a sound bed silence enabling mark and/or an object silence enabling mark from the code stream; and according to the sound bed mute enabling mark and/or the object mute enabling mark, analyzing mute marks of partial sound channels of each sound channel from the code stream.
20. The method of claim 18, wherein decoding the encoded information for each transmission channel comprises:
analyzing multi-channel side information from the code stream;
bit allocation is carried out for each transmission channel according to the multi-channel side information and the mute flag information so as to obtain the coding bit number of each transmission channel;
and decoding the coding information of each transmission channel according to the coding bit number of each transmission channel.
21. The method of claim 18, wherein after performing a multi-channel decoding process on the decoded signal of each transmission channel to obtain a multi-channel decoded output signal, the method further comprises:
-post-processing the multi-channel decoded output signal, the post-processing comprising at least one of: band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time frequency transformation.
22. The method of claim 20, wherein the multi-channel side information comprises at least one of: inter-channel amplitude difference parameter quantization codebook index, channel group pair number, and channel pair index;
the inter-channel amplitude difference parameter quantization codebook index is used for indicating inter-channel amplitude difference ILD parameter quantization codebook indexes of each channel;
the number of channel group pairs is used for representing the number of channel group pairs of the current frame of the multi-channel signal;
the channel pair index is used for representing the index of the channel pair.
23. An encoding apparatus, characterized in that the encoding apparatus comprises:
a mute flag information acquisition module, configured to acquire mute flag information of a multi-channel signal, where the mute flag information includes: a mute enable flag, and/or a mute flag;
the multichannel coding module is used for carrying out multichannel coding processing on the multichannel signals so as to obtain transmission channel signals of all transmission channels;
a code stream generating module, configured to generate a code stream according to the transmission channel signals of the transmission channels and the silence flag information, where the code stream includes: and the mute label information and the multi-channel coding result of the transmission channel signal.
24. A decoding apparatus, characterized in that the decoding apparatus comprises:
the analysis module is used for analyzing the mute label information from the code stream of the encoding equipment and determining the encoding information of each transmission channel according to the mute label information, wherein the mute label information comprises: a mute enable flag, and/or a mute flag;
the processing module is used for decoding the coding information of each transmission channel to obtain a decoding signal of each transmission channel;
the processing module is further configured to perform multi-channel decoding processing on the decoded signals of the transmission channels, so as to obtain multi-channel decoded output signals.
25. A terminal device, characterized in that the terminal device comprises: a processor, a memory; the processor and the memory are communicated with each other;
the memory is used for storing instructions;
the processor is configured to execute the instructions in the memory to perform the method of any one of claims 1 to 17.
26. A terminal device, characterized in that the terminal device comprises: a processor, a memory; the processor and the memory are communicated with each other;
The memory is used for storing instructions;
the processor is configured to execute the instructions in the memory to perform the method of any of claims 18 to 22.
27. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 17, or 18 to 22.
28. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 17, or 18 to 22.
29. A computer readable storage medium storing a code stream generated by the method of any one of claims 1 to 17.
CN202210699863.7A 2022-03-14 2022-06-20 Encoding and decoding method, encoding and decoding equipment and terminal equipment for multichannel signals Pending CN116798438A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2023/073845 WO2023173941A1 (en) 2022-03-14 2023-01-30 Multi-channel signal encoding and decoding methods, encoding and decoding devices, and terminal device
TW112108251A TW202403728A (en) 2022-03-14 2023-03-07 Coding method and coding device for multi-channel signal, and terminal device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210254868 2022-03-14
CN2022102548689 2022-03-14

Publications (1)

Publication Number Publication Date
CN116798438A true CN116798438A (en) 2023-09-22

Family

ID=88044406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210699863.7A Pending CN116798438A (en) 2022-03-14 2022-06-20 Encoding and decoding method, encoding and decoding equipment and terminal equipment for multichannel signals

Country Status (1)

Country Link
CN (1) CN116798438A (en)

Similar Documents

Publication Publication Date Title
KR101100221B1 (en) A method and an apparatus for decoding an audio signal
RU2449388C2 (en) Methods and apparatus for encoding and decoding object-based audio signals
TWI521502B (en) Hybrid encoding of higher frequency and downmixed low frequency content of multichannel audio
EP3762923B1 (en) Audio coding
KR102493482B1 (en) Time-domain stereo coding and decoding method, and related product
CN1885724A (en) Method and apparatus for generating bitstream of audio signal and audio encoding/decoding method and apparatus thereof
TW201911292A (en) Audio codec mode determination method and related products
US20220310101A1 (en) Time-domain stereo encoding and decoding method and related product
TWI501220B (en) Embedding and extracting ancillary data
EP3818730A1 (en) Energy-ratio signalling and synthesis
WO2023173941A1 (en) Multi-channel signal encoding and decoding methods, encoding and decoding devices, and terminal device
CN116798438A (en) Encoding and decoding method, encoding and decoding equipment and terminal equipment for multichannel signals
KR20240001226A (en) 3D audio signal coding method, device, and encoder
US11727943B2 (en) Time-domain stereo parameter encoding method and related product
US20240169998A1 (en) Multi-Channel Signal Encoding and Decoding Method and Apparatus
US20240177721A1 (en) Audio signal encoding and decoding method and apparatus
WO2024146408A1 (en) Scene audio decoding method and electronic device
EP4354430A1 (en) Three-dimensional audio signal processing method and apparatus
TW202422537A (en) Audio encoding and decoding method and apparatus, storage medium, and computer program product
CN117476016A (en) Audio encoding and decoding method, device, storage medium and computer program product
KR20230035373A (en) Audio encoding method, audio decoding method, related device, and computer readable storage medium
CN118314908A (en) Scene audio decoding method and electronic equipment
CN102479514A (en) Coding method, decoding method, apparatus and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication