WO2009067883A1 - An encoding/decoding method and a device for the background noise - Google Patents

An encoding/decoding method and a device for the background noise Download PDF

Info

Publication number
WO2009067883A1
WO2009067883A1 PCT/CN2008/072939 CN2008072939W WO2009067883A1 WO 2009067883 A1 WO2009067883 A1 WO 2009067883A1 CN 2008072939 W CN2008072939 W CN 2008072939W WO 2009067883 A1 WO2009067883 A1 WO 2009067883A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
noise
coding
noise frame
band
Prior art date
Application number
PCT/CN2008/072939
Other languages
French (fr)
Chinese (zh)
Inventor
Qi Zhang
Jinliang Dai
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2009067883A1 publication Critical patent/WO2009067883A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention relates to the field of voice communication technologies, and in particular, to a codec method and apparatus for background noise. Background technique
  • FIG. 1 is a schematic diagram of a method of compressing background noise in a DTX manner in voice communication.
  • VAD Voice Activity Detection
  • SID Session Insertion Descriptor
  • the corresponding decoding process is as follows: For the speech frame code stream Speech frame decoding reconstructs the speech signal; the non-continuous transmission system reconstructs a continuous comfortable background noise signal using a specific CNG (Comfort Noise Generation) algorithm based on the received non-contiguous SID frame stream. .
  • CNG Computer Noise Generation
  • G.729.1 is the latest generation of speech codec standard released by ITU (International Telecommunication Union).
  • ITU International Telecommunication Union
  • the biggest feature of this embedded speech codec standard is its layered coding, which can provide a code rate range of 8kb.
  • the narrowband-to-broadband audio quality of /s ⁇ 32kb/s allows the outer code stream to be discarded according to channel conditions during transmission, and has good channel adaptability.
  • a narrowband signal refers to a signal with a frequency band of 0 to 4000 Hz
  • a wideband signal refers to a signal with a frequency band of 0 to 8000 Hz
  • an ultrawideband signal refers to a signal with a frequency band of 0 to 16000 Hz.
  • the wideband signal can be decomposed into a low-band signal component and a high-band signal component.
  • the low-band signal (component) refers to a signal of 0 to 4000 Hz, and the low-band signal component can also be called a narrowband signal component.
  • the high band signal (component) refers to the signal of 4000 ⁇ 8000Hz, and the super high band signal (component) refers to the signal of 8000 ⁇ 16000Hz.
  • hierarchical is achieved by constructing the code stream into an embedded hierarchical structure.
  • the core layer is coded using the G.729 standard, which is a new type of embedded layered multi-rate speech coding. decoder.
  • the input is a 20ms superframe.
  • the input signal 3 ⁇ 4 (n) is first filtered by QMF (Quarature Mirror Filterbank) (H, ( ⁇ ), ⁇ 2 ( ) is divided into two sub-bands, the low sub-band signal is preprocessed by a high-pass filter with a cutoff frequency of 50 Hz, and the output signal s LB (n) uses a narrow-band embedded CELP of 8 kb/s to 12 kb/s (Code-Excited Linear-Prediction , code excited linear prediction) the encoder performs encoding, and the difference signal d L n between the locally synthesized signal of the CELP encoder at a code rate of 12 Kb/s) is subjected to a perceptual weighted filtering (W LB (z) ) signal d B (n) is transformed into the frequency domain by MDCT (Modified Discrete Cosine Transform).
  • Weighting filter W LB (z) contains gain compensation to maintain spectral
  • the high-band signal component is multiplied by (-1)".
  • the signal s» f after spectral inversion is preprocessed by a low-pass filter with a cutoff frequency of 3000 Hz.
  • the filtered signal uses TDBWE (Time-Domain Band Width Extension, time domain).
  • the bandwidth extension is encoded by the encoder.
  • the TD AC (Time Domain Alias Cancellation) coding module is also first converted to the frequency domain using MDCT.
  • the two sets of MDCT coefficients / ⁇ and ⁇ ) were finally encoded using TDAC.
  • some parameters are transmitted using the FEC (Frame Erasure Concealment) encoder to improve the error caused by frame loss during transmission.
  • FEC Fre Erasure Concealment
  • Figure 2 is a block diagram of the G.729.1 layer encoder system, where the dotted line is the QMF filter bank for the banding.
  • Figure 3 is a block diagram of the G.729.1 decoder decoder system. The actual working mode of the decoder is determined by the number of code streams received, which is also equivalent to the received code rate.
  • the dotted line portion is a QMF filter bank for synthesizing each subband into a full band signal. According to the different code rates received by the decoder, the conditions are as follows:
  • the code stream of the first layer or the first two layers is decoded by the embedded CELP decoder, The decoded signal s ( «) is further filtered to obtain a wideband signal that is combined into a QMF filter and combined into a 16 kHz signal after high-pass filtering, wherein the high-band component is set to zero.
  • the TDBWE decoder In addition to the CELP decoder decoding the low-band signal component, the TDBWE decoder also decodes the high-band signal component s ( «). For the MDCT transformation, the frequency component above 3000 Hz (corresponding to 7000 Hz or higher in the 16 kHz sampling rate) in the high-band signal component spectrum is set to 0, and then Inverse MDCT transform, after superposition and speech inversion, then synthesize a 16 kHz wideband signal in the QMF filter bank with the low band component decoded by the CELP decoder.
  • the TDBWE decoder decodes the high-band signal component
  • the TD AC decoder is also used to decode the low-band weighted differential signal and the high-band enhancement signal to enhance the full-band signal, and finally to synthesize the 16-kHz wideband signal in the QMF filter bank.
  • the code stream of G729.1 has a hierarchical structure, which allows the outer code stream to be discarded from the outside to the inside according to the transmission capability of the channel during transmission to achieve adaptation to the channel condition.
  • the discontinuous transmission mode for noise frames has not been defined in the G.729.1 standard, which means that for the gap phase in voice communication, the encoder still needs to encode according to the voice frame, which not only increases the coding.
  • the algorithmic burden of the device also wastes the limited transmission bandwidth of the channel, so it is necessary to introduce a discontinuous transmission mode for noise.
  • the frequency parameter is quantized by the current line spectrum, otherwise the line language corresponding to the average LPC parameter of the past 6 frames is used. By quantifying the frequency parameters, it can be seen that this alternative is discontinuous for the stationary nature of the background noise.
  • Table 1 G.729 AnnexB SID frame bit allocation
  • the energy of each frame is calculated by the smoothing method for the decoded frame energy, and the frequency parameter of the last SID line pair is directly copied for the line spectrum versus frequency parameter.
  • the above noise coding method is only suitable for encoding narrow-band noise, and is powerless for broadband noise, lacking bandwidth scalability.
  • a DTX/CNG noise coding method represented by an AMR-WB is also known in the prior art.
  • the AMR-WB is based on a 16 kHz sample, 20 ms frame processing, and performs variable rate encoding for a signal frame judged to be a speech signal in VAD detection, and an input signal judged to be background noise in VAD detection.
  • a fixed coding mode that is, outputting a frame of 35-bit SID frame information every 7 frames.
  • the SID coding parameters are mainly to encode the energy and spectral parameters of the background noise.
  • the energy parameter is the logarithmic domain energy of the current noise frame:
  • the AMR-WB is represented by the ISF (Immittance Spectral Frequency) parameter.
  • the ISF parameter is a 16-dimensional vector that is transformed from a 16-order LPC (Linear Prediction Coding) coefficient.
  • LPC Linear Prediction Coding
  • the average frame energy is quantized by 6 bits, and the quantization of the spectral parameters is divided into 5 sub-vectors by using the split quantization technique to quantize the 16-dimensional ISF quantized vector.
  • the SID frame length of the AMR-WB is 35 bits, and its bit allocation is as shown in Table 2:
  • Embodiments of the present invention provide a coding and decoding method and apparatus for background noise, which can perform coding with bandwidth scalability for background noise.
  • An embodiment of the present invention provides a coding and decoding method for background noise, including: when a received audio frame is a noise frame, selecting a noise frame that needs to be coded according to a transmission mode of the current noise frame;
  • the noise frame that needs to be encoded is hierarchically coded.
  • the coding parameters of the noise frame are decoded according to a transmission mode of the current noise frame
  • Background noise reconstruction is performed according to the coding parameters.
  • An embodiment of the present invention further provides an encoder, including:
  • a selecting unit configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, and send the selected result to the coding unit; and the coding unit is configured to send according to the selection unit
  • the noise frames that need to be encoded are hierarchically encoded.
  • An embodiment of the present invention further provides a decoder, including:
  • a decoding unit configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to a transmission mode of the current noise frame;
  • a reconstruction unit configured to perform background noise reconstruction according to the coding parameter of the noise frame sent by the decoding unit.
  • the invention also provides a codec system for background noise, comprising:
  • An encoder configured to: when the received audio frame is a noise frame, select a noise frame that needs to be coded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded;
  • a decoder configured to: when the audio frame received from the encoder is a layered coded noise frame, decode an encoding parameter of the noise frame according to a transmission mode of the current noise frame, and perform background noise according to the coding parameter reconstruction.
  • Embodiments of the present invention have the following advantages over the prior art:
  • the encoding end selects the noise frame to be encoded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame;
  • the transmission mode of the layered coded noise frame is decoded to decode the coding parameters of the noise frame, and background noise reconstruction is performed to achieve bandwidth scalability for background noise.
  • FIG. 1 is a schematic diagram of a method for compressing background noise in a DTX manner in the prior art
  • FIG. 2 is a schematic diagram of a G.729.1 encoder system in the prior art
  • FIG. 3 is a schematic diagram of a G.729.1 decoder system in the prior art
  • FIG. 4 is a schematic flowchart of a background noise encoding method according to Embodiment 1 of the present invention
  • FIG. 5 is a schematic flowchart of a background noise encoding method according to Embodiment 2 of the present invention
  • FIG. 6 is a DTX noise encoding according to Embodiment 2 of the present invention
  • FIG. 7 is a schematic diagram of a TDBWE encoder system for background noise according to Embodiment 2 of the present invention.
  • FIG. 8 is a schematic diagram of an encoder system according to Embodiment 2 of the present invention.
  • FIG. 9 is a schematic diagram of a CNG noise decoding module of a decoding end according to Embodiment 2 of the present invention
  • FIG. 10 is a schematic diagram of a method for recovering a low-band signal component by using a reconstructed low-band coding parameter according to Embodiment 2 of the present invention
  • FIG. 11 is a schematic diagram of a method for recovering a high-band signal component by using a reconstructed high-band coding parameter according to Embodiment 2 of the present invention.
  • FIG. 12 is a schematic diagram of a decoder system according to Embodiment 2 of the present invention.
  • FIG. 13 is a schematic flow chart of a method for encoding background noise according to Embodiment 3 of the present invention.
  • FIG. 14 is a schematic diagram of a coding end system of a noise frame according to Embodiment 3 of the present invention
  • FIG. 15 is a schematic diagram of a decoding end system of a noise frame according to Embodiment 3 of the present invention
  • 16 is a schematic diagram of an encoder according to Embodiment 5 of the present invention
  • FIG. 17 is a schematic diagram of a decoder according to Embodiment 6 of the present invention. detailed description
  • FIG. 4 a method for encoding and decoding background noise is shown in FIG. 4, and the specific steps are as follows:
  • Step S401 At the encoding end, use VAD detection on the input audio frame to determine the type of the current audio frame. If the current audio frame is a voice frame, the audio frame is encoded according to the voice frame coding algorithm, if the current frame is a noise frame and The previous frame is a speech frame (ie, switching from a speech frame to a noise frame is currently occurring), and the flow proceeds to step S402.
  • Step S402 If the switching from the speech frame to the noise frame currently occurs, it is also possible to first enter the tailing phase.
  • the switching from the voice frame to the noise frame may first enter the tailing phase, and the tailing phase is specifically: in the N frame time after the switching from the voice frame to the noise frame occurs, the voice frame is still followed.
  • the encoding algorithm encodes the current noise frame, but reduces the encoding speed.
  • Step S403 Select a noise frame to be encoded according to the transmission mode.
  • Two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the discontinuous transmission mode is used, it is determined whether the current frame needs to be encoded. If it is determined that the current noise frame needs to be encoded, the current frame is selected as the noise frame to be encoded, otherwise no processing is performed on the current frame; In the continuous transmission mode, the current frame is directly selected as the noise frame to be encoded, that is, all the received noise frames are encoded.
  • discontinuous transmission mode it is determined whether the current frame needs to be encoded. If it is determined that the current noise frame needs to be encoded, the current frame is selected as the noise frame to be encoded, otherwise no processing is performed on the current frame; In the continuous transmission mode, the current frame is directly selected as the noise frame to be encoded, that is, all the received noise frames are encoded.
  • Step S404 performing narrowband core layer coding on the noise frame that needs to be encoded.
  • the low-band signal component of the noise frame that needs to be encoded is obtained, and the core layer parameter encoding is performed on the low-band signal component.
  • the method for obtaining a low-band signal component of a noise frame to be encoded includes: performing band-sampling on a noise frame to be coded, dividing the noise frame into a low-band signal component and a high-band signal component; or performing high-pass filtering on the noise signal of the full-band, And the sample processing is performed to obtain a low-band signal component.
  • the method for performing narrowband core layer coding on the acquired lowband signal component comprises: linearly predicting and analyzing a lowband signal component of the noise frame to obtain a linear prediction coefficient and a signal energy; converting the linear prediction coefficient into a spectral parameter, and using the spectral parameter Perform vector quantization to obtain quantized spectral parameters; quantize the signal energy to obtain frame energy; and use the quantized spectral parameters and frame energy as narrow-band core layer parameters of the noise frame.
  • Step S405 If the enhancement layer coding is further required, the noise frame encoded by the narrowband core layer is subjected to extension layer coding.
  • the noise frame is subjected to narrowband enhancement layer coding, that is, the quantization error of the spectral parameters in the narrowband core layer and the quantization error of the signal energy are quantized.
  • Broadband spreading layer coding is performed on the noise frame, that is, the high-band signal component of the noise frame is subjected to extended parameter coding.
  • the extension layer can be one layer or multiple layers.
  • the broadband extension layer includes a broadband core layer and a broadband enhancement layer.
  • the broadband extended layer coding of the noise frame specifically includes: acquiring a time domain envelope and a frequency domain envelope of the high band signal component, and subtracting the quantized time domain envelope from each dimension component of the frequency domain envelope, and the obtained vector is split into A plurality of sub-vectors are separately quantized to obtain a wideband extension layer coding parameter.
  • Step S406 After the encoding is completed, the encoded noise frame is transmitted.
  • Step S407 Decode the encoding parameter from the received encoded code stream at the decoding end, and determine the type of the current audio frame. If the current audio frame is a voice frame, decode the audio frame according to the voice frame decoding algorithm. Otherwise, turn Step S408.
  • Step S408 If the received audio frame is a noise frame, the coding parameters of the noise frame are decoded according to the transmission mode of the current noise frame.
  • the coding parameters of the received noise frame are decoded, and for the untransmitted noise frame, according to the previously received noise frame or the coding parameter buffered in the trailing phase.
  • the encoding parameters of the current noise frame are decoded.
  • the coding parameters are decoded for the received noise frame.
  • Step S409 Perform background noise reconstruction according to the decoded coding parameters.
  • the coefficients of the synthesis filter are calculated using the reconstructed spectral parameters, and Gaussian random noise is used as the excitation, and the calculation is performed.
  • the synthesized filter is synthesized and filtered, and the reconstructed energy parameter is used for time domain shaping to reconstruct the background noise signal; or the low band coding parameter is CELP decoded to obtain the decoded low band signal component, and the low band signal is obtained.
  • the component is sampled as a full-band signal and spectrally spread to reconstruct a background noise signal.
  • the TDBWE decoding algorithm may be used to reconstruct the background noise signal from the noise frame; or the background noise signal reconstructed from the noise frame by the TDAC decoding algorithm may be used.
  • the method for reconstructing the background noise signal from the noise frame by using the TDB WE decoding algorithm is as follows: Calculate the coefficient of the synthesis filter using the reconstructed spectral parameters, use Gaussian random noise as the excitation, and perform synthesis filtering through the calculated synthesis filter. And use to rebuild The energy parameters are time domain shaped to obtain the low-band signal component of the background noise signal. Using Gaussian random noise as the excitation source, the reconstructed high-band coding parameters are used for time domain shaping and frequency domain shaping of the excitation source to reconstruct the background noise. The high-band signal component of the signal; performing QMF synthesis filtering on the reconstructed low-band signal component and the high-band signal component to obtain a background noise signal.
  • the method of constructing the background noise signal for the noise frame by using the TDAC decoding algorithm is as follows: Decoding the low-band signal component by the CELP decoding algorithm for the low-band coding parameter, raising the low-band signal component and performing frequency-spreading to obtain the whole The frequency band signal is subjected to inverse quantization and inverse MDCT transform on the reconstructed high-band coding parameters to obtain a residual signal, which is combined with the full-band signal to obtain a broadband background noise signal.
  • the high-band signal component is encoded by the TDBWE encoding algorithm as an example, and a background noise encoding and decoding method is shown in FIG. 5, and the specific steps are as follows:
  • Step S501 At the encoding end, input a data length of 20 ms and a sampling rate of 16000 Hz, and use VAD detection on the input audio frame to determine the type of the current frame. If the current frame is a voice frame, go to step S502, if current The frame is a noise frame and the previous frame is a voice frame (ie, the switching from the voice frame to the noise frame currently occurs), and the process goes to step S503.
  • the frame structure of the full-rate speech frame used in this embodiment is as shown in Table 3.
  • TDBWE Layer 3 - Broadband Enhancement Layer
  • Step S502 If the current frame is a voice frame, the current frame is encoded according to a voice frame coding algorithm, and a coded stream of up to 32 kb/s can be encoded.
  • Step S503 If the switching from the voice frame to the noise frame occurs currently, the smear phase may also be entered first.
  • the trailing phase duration is N frames, that is, in the N frame time after the switching from the voice frame to the noise frame occurs, the current noise frame is still encoded according to the encoding algorithm of the voice frame, but the encoding speed is reduced. For example, if the encoding rate of the speech frame before switching is encoded, if the encoding rate of the speech frame before switching is 8 kb/s or 12 kb/s, then the packet is advanced.
  • the learning and training of the noise parameters can be completed at the same time, that is, the autocorrelation function of the low-band signal component of the buffering tail stage, the low-band coding parameter and the high-band coding parameter are used for initializing the encoding of the subsequent noise frame.
  • discontinuous transmission (DTX) mode two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the current frame is encoded and transmitted in the discontinuous transmission mode, step S504 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S505 to S507 are directly performed.
  • DTX discontinuous transmission
  • continuous transmission mode all the received noise frames are encoded, and steps S505 to S507 are directly performed.
  • Step S504 Determine whether the current noise frame needs to be encoded. If the current noise frame needs to be encoded, go to step S505, otherwise no processing is performed on the current frame.
  • the DTX policy may be determined by using specific criteria to determine whether the current frame needs to be encoded, that is, the spectrum of the current noise frame, the energy relative to the long-term average spectrum, and the energy (ie, the average spectrum of the previously buffered coding parameters, energy). Distortion, if the distortion exceeds a certain threshold, the noise frame is encoded, otherwise no processing is performed on the current frame.
  • the implementation module for encoding the noise frame is shown in Figure 6. Shown.
  • Step S505 Perform narrowband core layer coding on the current noise frame.
  • the narrowband core layer parameter coding may use the CELP model, and perform QMF banding filtering on the background noise frame that needs to be SID coded, and divide into several subbands according to the frequency.
  • This embodiment takes the simplest case.
  • the background noise frame is divided into two sub-bands: a low-band signal component and a high-band signal component 3 ⁇ 4 ( «), a low-band signal component frequency range of 0 to 4000 Hz, and a high-band signal component frequency range of 4000 to 8000 Hz.
  • Step S506 If the extension layer parameter coding is needed, the extension layer parameter coding is performed on the noise frame encoded by the narrowband core layer.
  • the quantization error of the spectral parameters in the narrowband core layer and the quantization error of the energy parameter are further quantized, that is, if the spectral parameter before quantization is ⁇ , the spectral parameter after quantization in the core layer is Then, in the narrowband enhancement layer, the pair is quantized, and the quantization result is the index value in the spectral quantization codebook in the enhancement layer; for the energy parameter, a similar method is also used to quantize the £- to obtain the narrowband enhancement layer. Encoded noise frame.
  • the noise frame encoded by the narrowband enhancement layer is subjected to extended parameter coding.
  • the high-band signal component is decomposed from the background noise frame, and the TDBWE encoding algorithm is used to perform extended parameter encoding on the high-band signal component, as shown in FIG. 7 That is, the time domain envelope or the frequency domain envelope of the high band signal component is first calculated separately.
  • the calculation method of the time domain envelope is as shown in formula (1):
  • I is the number of time domain envelopes.
  • the calculation method of the frequency domain envelope is as follows: First, a high-band signal component is windowed using a 128-tap Hanning window. The window function is as shown in equation (2):
  • the high-band signal component after windowing is:
  • j is the number of frequency domain envelopes.
  • the embodiment of the present invention can also be applied to obtain a frequency domain envelope for any band of a high band, and the number of frequency domain envelopes can also be any value greater than 0, and thus is not limited to the application in G.729.1. Because the encoding of the background noise, the human ear can not distinguish the time domain envelope of the background noise very finely, so it does not need to be divided into 16 time domain envelopes like a speech frame, but only needs to calculate the entire frame.
  • the average time domain envelope can be, as shown in equation (6):
  • the obtained time domain envelope is quantized using a uniform quantizer with a length of 5 bits and a quantization step size of 3 dB.
  • the quantized time domain envelope is represented by 7 ⁇ , and then the dimensional components of the J-dimensional frequency domain envelope are reduced.
  • the vector after 7 ⁇ is split into 3 sub-vectors, and quantized separately; the quantized time domain envelope and the frequency domain envelope are output through the multiplexer to obtain a noise frame encoded by the wideband extension layer.
  • Step S507 After the encoding is completed, the encoded noise frame is transmitted.
  • the encoder system of the embodiment of the present invention is as shown in FIG.
  • Step S508 Decode the encoding parameter from the received encoded code stream at the decoding end, and determine the type of the current frame. If the current frame is a voice frame, decode the audio frame according to the voice frame decoding algorithm, if the current frame is The noise frame is changed to step S509.
  • the media gateway may discard some coded bits from the outer layer to the inner layer according to channel conditions to adapt to the channel transmission capability, so even if the encoder sends the full rate
  • the decoder may also be unable to receive the full rate stream.
  • the decoder can only decode according to the actual received code stream according to the corresponding rate.
  • Step S509 Reconstruct the coding parameters of the received noise frame, and reconstruct a background noise signal according to the coding parameters of the noise frame.
  • the decoder can only be connected intermittently. Receiving the SID frame, reconstructing the encoding parameter for the received noise frame, and reconstructing the encoding parameter of the current frame by the previously received noise frame or the noise parameter learned in the trailing phase for the frame that is not transmitted, and then performing the background Noise reconstruction.
  • the decoding module in the discontinuous transmission mode is shown in FIG.
  • the coding parameters are reconstructed for all received noise frames for background noise reconstruction.
  • the received noise frame only contains the narrowband core layer
  • it is necessary to calculate the coding parameter of the narrowband core layer [0, Division, construct the filter using the reconstructed spectral parameter ⁇ , wherein the filter uses Gaussian random noise as the excitation
  • the signal is used to filter the coding parameters of the narrow-band core layer, and the encoded parameters of the filtered narrow-band core layer are then shaped by using the decoded energy parameter E, thereby reconstructing the low-band signal component of the background noise, as shown in FIG. .
  • the decoder also requires to output a wideband signal, the highband signal component is set to 0, and the wideband signal output can be synthesized by using the QMF synthesis filter and the reconstructed lowband signal component. If the decoder does not require the output of the wideband signal, then The reconstructed low-band signal component can be directly output.
  • the received noise frame further includes a narrowband enhancement layer
  • the narrowband enhancement layer since the narrowband enhancement layer only enhances the quantization precision of the core layer spectral parameters and the energy parameters, no new parameters are added, so the spectral parameters and energy parameters obtained by decoding are used.
  • a reconstructed wideband or narrowband background noise signal can be obtained by a decoding process similar to that of a narrowband core layer only stream.
  • the decoder system of the embodiment of the present invention is as shown in FIG.
  • the TD AC encoding algorithm is used to encode the high-band signal component as an example, and a background noise encoding and decoding method is shown in FIG. 13, and the specific steps are as follows: Step S1301, at the encoding end, Using the VAD detection on the input audio frame, determining the type of the current frame, if the current frame is a voice frame, go to step S1302, if the current frame is a noise frame and the previous frame is a voice frame (ie, the current frame from the voice frame to the noise frame occurs) Switching), go to step S1303.
  • the frame structure of the full-rate noise frame used in this embodiment is as shown in Table 5: Table 5 Bit allocation of noise frames
  • Step S1302 If the current frame is a voice frame, the current frame is encoded according to a voice frame coding algorithm, and a coded stream of up to 32 kb/s can be encoded.
  • Step S1303 If the switching from the voice frame to the noise frame occurs currently, First enter the trailing phase.
  • the trailing phase duration is N frames, that is, in the N frame time after the switching from the voice frame to the noise frame occurs, the current noise frame is still encoded according to the encoding algorithm of the voice frame, but the encoding speed is reduced. For example, if the encoding rate of the speech frame before switching is encoded, if the encoding rate of the speech frame before switching is 8 kb/s or 12 kb/s, then the packet is advanced.
  • the learning and training of the noise parameters can be completed at the same time, that is, the autocorrelation function of the low-band signal component of the buffering tail stage, the low-band coding parameter and the high-band coding parameter are used for initializing the encoding of the subsequent noise frame.
  • two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the current frame is encoded and transmitted in the discontinuous transmission mode, step S1304 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S1305 to S1307 are directly performed.
  • DTX discontinuous transmission
  • step S1304 If the current frame is encoded and transmitted in the discontinuous transmission mode, step S1304 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S1305 to S1307 are directly performed.
  • Step S1304 Determine whether it is necessary to encode the current noise frame. If the current noise frame needs to be encoded, go to step S1305, otherwise no processing is performed on the current frame.
  • the method for determining whether the current frame needs to be encoded is the same as the step S504 in the second embodiment, and details are not described herein again.
  • Step S1305 Perform high-pass filtering and down-sample processing on the noise signal of the full-band to obtain a low-band signal component of the noise frame.
  • the low-band signal component of the noise frame can be obtained by using the QMF filtering method in the second embodiment, or the low-band signal component of the noise frame can be obtained by using the high-pass filtering and the down-sample processing method.
  • High-pass filtering and down-sampling methods are preferred.
  • High-pass can be performed on the noise signal x(n) by using a second-order elliptical high-pass filter transfer function
  • the filtered noise signal y(n) is obtained by filtering, and the transfer function is as shown in formula (7):
  • Step S1306 pre-emphasizing the low-band signal component of the noise frame, and then performing CELP coding to obtain a low-band coding parameter of the noise frame, where the noise frame may only include a narrow-band core layer parameter, or may include a narrowband
  • the core layer also contains a narrowband enhancement layer.
  • the spectral parameters and frame energy are used as narrow-band core layer parameters of background noise [0, ⁇ ].
  • Step S1307 reconstructing a low-band signal component by using the low-band coding parameter of the obtained noise frame.
  • the synthesized filter is constructed by using the reconstructed spectral parameters, and the Gaussian random noise is used as the excitation signal, filtered by the synthesis filter, and the output of the filter is shaped by using the decoded energy parameters to reconstruct the background noise.
  • Low band signal component the Gaussian random noise
  • Step S1308 Ascending the reconstructed low-band signal component to the original sampling rate, and performing spectrum expansion to obtain the reconstructed full-band signal.
  • Step S1309 Perform MDCT transformation on the residual of the original full-band signal and the reconstructed full-band signal, quantize and encode the MDCT coefficients, obtain a high-band coding parameter of the noise frame, and reconstruct a noise frame high-band signal component, and the noise frame It may contain only a broadband core layer, or it may include both a broadband core layer and a broadband enhancement layer.
  • Step S1310 The low-band signal component and the high-band signal component are processed by the multiplexer to obtain a coded code stream of the background noise of the hierarchical structure and transmitted.
  • the encoder system of the embodiment of the present invention is as shown in FIG.
  • Step S1311 At the decoding end, decoding the coding parameter from the received coded stream, and determining the type of the current frame. If the current frame is a voice frame, decoding the audio signal according to the voice frame decoding algorithm, if the current frame is The noise frame is changed to step S1312.
  • the media gateway can discard the outer coded bits of the noise frame when needed according to the transmission characteristics of the channel without affecting the decoding of the inner layer bits.
  • the decoder decodes based on the actual received code stream. Step S1312: reconstruct an encoding parameter of the received noise frame, and reconstruct a background noise signal according to the encoding parameter of the noise frame.
  • CELP decoding is performed on the received noise frame to obtain a decoded lowband signal component, and the lowband signal is obtained.
  • the component is sampled as a full-band signal and frequency-spreaded to obtain a reconstructed background noise signal.
  • the low-band coding parameters of the received noise frame are decoded by the CELP decoding algorithm to the low-band signal component, and the low-band signal component is boosted. And frequency-spreading is performed to obtain a full-band signal; the high-band coding parameters (ie, MDCT coefficients) of the received noise frame are subjected to inverse quantization and inverse MDCT transform to obtain a residual signal, and the full-band reconstructed with the low-band signal component The signals are added together to obtain the final reconstructed full-band background noise.
  • the high-band coding parameters ie, MDCT coefficients
  • the block diagram of the decoder system of this embodiment is as shown in FIG.
  • the encoding end selects the noise frame that needs to be encoded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; the decoding end receives according to the The transmission mode of the layered coded noise frame decodes the coding parameters of the noise frame, and performs background noise reconstruction to achieve bandwidth scalability for background noise.
  • Embodiment 4 of the present invention provides a codec system, including:
  • the encoder 10 is configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded.
  • the decoder 20 is configured to: when the audio frame received from the encoder is a layered coded noise frame, decode the coding parameters of the noise frame according to the transmission mode of the current noise frame, and perform background noise reconstruction according to the coding parameter.
  • the fifth embodiment of the present invention provides an encoder, as shown in FIG. 16, including: a selecting unit 11 configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, The result of the selection is sent to the coding unit.
  • the encoding unit 12 is configured to perform layered coding on the noise frame that needs to be encoded according to the result of the sending by the selecting unit.
  • the encoder further includes: a determining unit 13 configured to determine a type of the currently received audio frame, and when the audio frame is a noise frame and the previous frame is a voice frame, the received noise frame is sent to the specific frame time.
  • the speech coding unit transmits the received noise frame to the selection unit 11 after a specific frame time.
  • the voice frame coding unit 14 is configured to: after receiving the noise frame sent by the determining unit 13, encode the noise frame according to the voice coding algorithm, reduce the coding rate, and buffer the coding parameters of the received noise frame.
  • the coding unit 12 further includes: a low band coding sub-unit 121 for performing core layer coding on the low band signal component of the noise frame.
  • the high-band coding sub-unit 122 is configured to perform enhancement layer coding on the high-band signal component of the noise frame encoded by the core layer coding sub-unit.
  • Embodiment 6 of the present invention provides a decoder as shown in FIG. 17, which includes:
  • the decoding unit 21 is configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to the transmission mode of the current noise frame.
  • the reconstruction unit 22 is configured to perform background noise reconstruction according to the coding parameters of the noise frame transmitted by the decoding unit.
  • the reconstruction unit 22 further includes: a low-band sub-unit 221, configured to use a low-band coding parameter output by the decoding unit when the received noise frame includes only the narrow-band core layer or both the narrow-band core layer and the narrow-band enhancement layer, The low-band signal component of the background noise signal is reconstructed.
  • the high-band sub-unit 222 is configured to reconstruct a high-band signal component of the background noise signal by using a high-band coding parameter output by the decoding unit when the received noise frame further includes the broadband extension layer.
  • the synthesizing subunit 223 is configured to perform synthesis filtering on the low band signal component and the high band signal component to obtain a background noise signal.
  • the encoding end selects the noise frame that needs to be coded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; the decoding end is based on the received segment.
  • the transmission mode of the layer-coded noise frame decodes the coding parameters of the noise frame, and performs background noise reconstruction to achieve bandwidth-scalable decoding of the background noise.
  • the present invention can be implemented by hardware, or can be implemented by means of software plus necessary general hardware platform, and the technical solution of the present invention. It can be embodied in the form of a software product that can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including a number of instructions for making a computer device (may It is a personal computer, a server, or a network device, etc.) that performs the methods described in various embodiments of the present invention.
  • a non-volatile storage medium which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a computer device may It is a personal computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An encoding and decoding method and a device of the background noise are proposed. The method includes that a noise frame needed to be encoded is chosen according to the transmission mode of current noise frame when the received frame is a noise frame; performing layered encoding for the noise frame needed to be encoded. Therefore, the background noise frame is scalable encoded with bandwidth. Correspondingly, the decoding method could achieve the scalable decoding with bandwidth for the background noise.

Description

一种背景噪声的编解码方法和装置 本申请要求于 2007 年 11 月 7 日提交中国专利局、 申请号为 200710169832.6、 发明名称为 "一种背景噪声的编解码方法和装置" 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域  Method and apparatus for encoding and decoding background noise. The present application claims to Chinese Patent Application No. 200710169832.6, entitled "A Background Code Noise Codec Method and Apparatus", filed on November 7, 2007. Priority is hereby incorporated by reference in its entirety. Technical field
本发明涉及语音通信技术领域,尤其涉及一种背景噪声的编解码 方法和装置。 背景技术  The present invention relates to the field of voice communication technologies, and in particular, to a codec method and apparatus for background noise. Background technique
对于语音通信来说,只有约 40%的时间属于以语音为主体的有用 信号, 而 60%左右的语音间隙属于无用的背景噪声信息。对于语音间 隙的背景噪声而言,如果釆用与语音信号一样高的码率进行传输势必 造成网络带宽的巨大浪费;而完全不传输背景噪声又会造成接收端的 听觉上不连续, 会让人感觉很不舒服, 当背景噪声较强时尤为明显, 甚至会影响人们对于语音信息的正常理解。在这一背景下产生了很多 以 DTX ( Discontinuous Transmission, 非连续传输)方式压缩背景噪 声的方法, 一方面可以实现降低语音通讯的带宽, 另一方面可以同时 维持接收端听觉的连续性。  For voice communication, only about 40% of the time is a useful signal with speech as the main body, and about 60% of the speech gap is useless background noise information. For the background noise of the speech gap, if the transmission rate is as high as the speech signal, it will inevitably cause a huge waste of network bandwidth; while not transmitting the background noise at all will cause the receiver to be audibly discontinuous, which will make people feel Very uncomfortable, especially when the background noise is strong, and even affect people's normal understanding of voice information. In this context, a number of methods for compressing background noise by DTX (Discontinuous Transmission) are generated, which can reduce the bandwidth of voice communication on the one hand, and maintain the continuity of hearing at the receiving end on the other hand.
图 1是语音通信中以 DTX方式压缩背景噪声的方法示意图。 在 编码端输入信号后, 对输入信号进行 VAD ( Voice Activity Detection, 语音激活检测)检测, 判断当前帧的类型, 若当前帧为语音信号则进 行相应的语音编码; 若当前帧为背景噪声, 则由 DTX编解码***根 据相应的非连续传输策略进行 SID ( Silence Insertion Descriptor,静音 ***帧)的编码。 相应的解码端的处理情况为: 对于语音帧码流进行 语音帧解码重建出语音信号;而非连续传输***才艮据接收到的非连续 的 SID帧码流, 运用特定的 CNG ( Comfort Noise Generation,舒适噪 声重建出)算法重建出连续的舒适背景噪声信号。 FIG. 1 is a schematic diagram of a method of compressing background noise in a DTX manner in voice communication. After inputting the signal at the encoding end, VAD (Voice Activity Detection) detection is performed on the input signal to determine the type of the current frame, and if the current frame is a voice signal, corresponding speech coding is performed; if the current frame is background noise, The SID (Sience Insertion Descriptor) encoding is performed by the DTX codec system according to the corresponding discontinuous transmission strategy. The corresponding decoding process is as follows: For the speech frame code stream Speech frame decoding reconstructs the speech signal; the non-continuous transmission system reconstructs a continuous comfortable background noise signal using a specific CNG (Comfort Noise Generation) algorithm based on the received non-contiguous SID frame stream. .
G.729.1是 ITU ( International Telecommunication Union, 国际电 信联盟)最新发布的新一代语音编解码标准, 这种嵌入式语音编解码 标准最大的特点是具有分层编码的特性, 能够提供码率范围在 8kb/s~32kb/s的窄带到宽带的音频质量, 允许在传输过程中, 根据信 道状况丟弃外层码流, 具有良好的信道自适应性。 一般来说, 在语音 编解码领域, 窄带信号是指频带 0~4000Hz的信号, 宽带信号是指频 带在 0~8000Hz的信号, 超宽带信号是指频带在 0~16000Hz的信号。 在本文中, 宽带信号又可以分解为低带信号分量和高带信号分量, 低 带信号 (分量) 均指 0~4000Hz的信号, 低带信号分量又可以称为窄 带信号分量。 高带信号 (分量)是指 4000~8000Hz 的信号, 超高带 信号 (分量)是指 8000~16000Hz的信号。  G.729.1 is the latest generation of speech codec standard released by ITU (International Telecommunication Union). The biggest feature of this embedded speech codec standard is its layered coding, which can provide a code rate range of 8kb. The narrowband-to-broadband audio quality of /s~32kb/s allows the outer code stream to be discarded according to channel conditions during transmission, and has good channel adaptability. Generally speaking, in the field of speech codec, a narrowband signal refers to a signal with a frequency band of 0 to 4000 Hz, a wideband signal refers to a signal with a frequency band of 0 to 8000 Hz, and an ultrawideband signal refers to a signal with a frequency band of 0 to 16000 Hz. In this paper, the wideband signal can be decomposed into a low-band signal component and a high-band signal component. The low-band signal (component) refers to a signal of 0 to 4000 Hz, and the low-band signal component can also be called a narrowband signal component. The high band signal (component) refers to the signal of 4000~8000Hz, and the super high band signal (component) refers to the signal of 8000~16000Hz.
在 G.729.1标准中, 通过将码流构造成嵌入式的分层结构来达到 分级性, 其核心层使用 G.729标准进行编码, 是一种新型的嵌入式可 分层的多速率语音编解码器。 输入为 20ms的超帧, 当釆样率为 16000Hz , 帧长为 320点, 输入信号 ¾ (n)首先经过 QMF ( Quadrature Mirror Filterbank , 正交镜像滤波器)滤波 (H, (ζ), Η2 ( )分成两个子带, 低子带信号 经过 50Hz截止频率的高通滤波器进行预处理, 输出 信号 sLB(n)使用 8kb/s~12kb/s的窄带嵌入式 CELP ( Code-Excited Linear-Prediction,码激励线性预测)编码器进行编码, 和 12Kb/s 码率下 CELP编码器的本地合成信号 之间的差值信号 dL n)经过 知觉加权滤波( WLB (z) )后的信号 d B (n)通过 MDCT ( Modified Discrete Cosine Transform, 修正的离散余弦变换) 变换到频域。 加权滤波器 WLB (z)包含了增益补偿, 用来保持滤波器输出 s(«)与高带输入信号 之间的谱连续性。 加权后的差值信号要变换到频域内。 In the G.729.1 standard, hierarchical is achieved by constructing the code stream into an embedded hierarchical structure. The core layer is coded using the G.729 standard, which is a new type of embedded layered multi-rate speech coding. decoder. The input is a 20ms superframe. When the sample rate is 16000Hz and the frame length is 320 points, the input signal 3⁄4 (n) is first filtered by QMF (Quarature Mirror Filterbank) (H, (ζ), Η 2 ( ) is divided into two sub-bands, the low sub-band signal is preprocessed by a high-pass filter with a cutoff frequency of 50 Hz, and the output signal s LB (n) uses a narrow-band embedded CELP of 8 kb/s to 12 kb/s (Code-Excited Linear-Prediction , code excited linear prediction) the encoder performs encoding, and the difference signal d L n between the locally synthesized signal of the CELP encoder at a code rate of 12 Kb/s) is subjected to a perceptual weighted filtering (W LB (z) ) signal d B (n) is transformed into the frequency domain by MDCT (Modified Discrete Cosine Transform). Weighting filter W LB (z) contains gain compensation to maintain spectral continuity between the filter output s («) and the high-band input signal. The weighted difference signal is transformed into the frequency domain.
高带信号分量乘上 (-1)"进行谱反转之后的信号 s»f 通过截止 频率为 3000HZ 的低通滤波器进行预处理, 滤波后的信号 使用 TDBWE ( Time-Domain BandWidth Extension, 时域带宽扩展)编码 器进行编码, 进入 TD AC ( Time Domain Alias Cancellation, 时域混叠 消除)编码模块的 也要先使用 MDCT变换到频域上。 The high-band signal component is multiplied by (-1)". The signal s» f after spectral inversion is preprocessed by a low-pass filter with a cutoff frequency of 3000 Hz. The filtered signal uses TDBWE (Time-Domain Band Width Extension, time domain). The bandwidth extension) is encoded by the encoder. The TD AC (Time Domain Alias Cancellation) coding module is also first converted to the frequency domain using MDCT.
两组 MDCT系数 /^ 和^^^)最后使用 TDAC进行编码。另外, 还有一些参数用 FEC ( Frame Erasure Concealment, 帧差错隐藏)编 码器进行传输, 用以改进在传输中出现丟帧时造成的错误。  The two sets of MDCT coefficients /^ and ^^^) were finally encoded using TDAC. In addition, some parameters are transmitted using the FEC (Frame Erasure Concealment) encoder to improve the error caused by frame loss during transmission.
图 2为 G.729.1各层编码器***框图, 其中虚线部分是用于分带 的 QMF滤波器组。 图 3为 G.729.1各层解码器***框图, 解码器的 实际工作模式由接收到的码流层数决定,也等价于由接收到的码率决 定。其中虚线部分是用于把各个子带合成全带信号的 QMF滤波器组。 根据解码器接收到的不同码率各情况分述如下:  Figure 2 is a block diagram of the G.729.1 layer encoder system, where the dotted line is the QMF filter bank for the banding. Figure 3 is a block diagram of the G.729.1 decoder decoder system. The actual working mode of the decoder is determined by the number of code streams received, which is also equivalent to the received code rate. The dotted line portion is a QMF filter bank for synthesizing each subband into a full band signal. According to the different code rates received by the decoder, the conditions are as follows:
1、 如果接收到的码率为 8kb/s或 12kb/s (即仅接收到第一层或者 前两层 ): 第一层或者前两层的码流由嵌入式 CELP解码器进行解码, 得到解码后的信号 s(«) , 再进行后滤波得到 经过高通滤波 之后进入 QMF滤波器组合成 16kHz的宽带信号, 其中高带分量置 0。 1. If the received code rate is 8 kb/s or 12 kb/s (that is, only the first layer or the first two layers are received): The code stream of the first layer or the first two layers is decoded by the embedded CELP decoder, The decoded signal s («) is further filtered to obtain a wideband signal that is combined into a QMF filter and combined into a 16 kHz signal after high-pass filtering, wherein the high-band component is set to zero.
2、 如果接收到的码率为 14kb/s (即接收到前三层): 除了 CELP 解码器解码出低带信号分量以外, TDBWE解码器也解码出高带信号 分量 s («)。 对 进行 MDCT变换, 把高带信号分量谱中 3000Hz 以上 (对应于 16kHz釆样率中 7000Hz以上)频率分量置 0 , 然后进行 逆 MDCT变换, 迭加之后并进行语翻转, 然后在 QMF滤波器组中与 CELP解码器解码出的低带分量^ —起合成 16kHz的宽带信号。 2. If the received code rate is 14 kb/s (ie, the first three layers are received): In addition to the CELP decoder decoding the low-band signal component, the TDBWE decoder also decodes the high-band signal component s («). For the MDCT transformation, the frequency component above 3000 Hz (corresponding to 7000 Hz or higher in the 16 kHz sampling rate) in the high-band signal component spectrum is set to 0, and then Inverse MDCT transform, after superposition and speech inversion, then synthesize a 16 kHz wideband signal in the QMF filter bank with the low band component decoded by the CELP decoder.
3、 如果接收到 14kb/s以上速率的码流(对应于前四层或者更多 层): 除了 CELP解码器解码出低带信号分量 («)、 TDBWE解码器 解码出高带信号分量 以外, 还要使用 TD AC解码器解码出低带 加权差分信号和高带增强信号, 对全频带信号进行增强, 最终也在 QMF滤波器组中合成 16kHz的宽带信号。  3. If a code stream of 14kb/s or higher is received (corresponding to the first four layers or more layers): In addition to the CELP decoder decoding the low-band signal component («), the TDBWE decoder decodes the high-band signal component, The TD AC decoder is also used to decode the low-band weighted differential signal and the high-band enhancement signal to enhance the full-band signal, and finally to synthesize the 16-kHz wideband signal in the QMF filter bank.
G729.1的码流具有分层结构,允许在传输的过程中根据信道的传 输能力从外向内丟弃外层码流, 以达到对信道状况的自适应。 但是由 于种种原因, G.729.1标准中尚未定义对噪声帧的非连续传输模式,这 也就意味着对于语音通信中的间隙阶段,编码器仍然需要按照语音帧 进行编码, 这样不仅加大了编码器的算法负担, 同时也浪费了信道有 限的传输带宽, 因此需要引入一种针对噪声的非连续传输模式。  The code stream of G729.1 has a hierarchical structure, which allows the outer code stream to be discarded from the outside to the inside according to the transmission capability of the channel during transmission to achieve adaptation to the channel condition. However, for various reasons, the discontinuous transmission mode for noise frames has not been defined in the G.729.1 standard, which means that for the gap phase in voice communication, the encoder still needs to encode according to the voice frame, which not only increases the coding. The algorithmic burden of the device also wastes the limited transmission bandwidth of the channel, so it is necessary to introduce a discontinuous transmission mode for noise.
现有技术中存在一种 G.729 AnnexB的 DTX/CNG噪声编码方法, 对于在 VAD检测中判断为背景噪声的信号帧, 计算当前帧的谱、 能量 相对于长时平均谱、 能量的失真, 若超过一定的阔值, 则进行噪声帧 编码。 编码参数中的帧能量是经过平滑的帧能量, 釆用 5比特量化编 码。 对于线谱对频率参数的量化对象, 在当前谱参数与过去 6帧的平 均谱参数之间选择。 若当前 LPC ( Linear Prediction Coding , 线性预测 编码)参数与过去 6帧平均 LPC参数的距离大于 1.12202, 釆用当前线 谱对频率参数进行量化, 否则对过去 6帧的平均 LPC参数所对应的线 语对频率参数进行量化,可见这一选择方案对于背景噪声的平稳特性 来说是不连续的。釆用 1比特的预测器与 5比特 +4比特的 2级矢量量化, SID帧比特分配如表 1所示: 表 1 G.729 AnnexB SID帧比特分配 In the prior art, there is a DTX/CNG noise coding method of G.729 AnnexB. For a signal frame determined as background noise in VAD detection, the spectrum of the current frame, the energy relative to the long-term average spectrum, and the distortion of the energy are calculated. If a certain threshold is exceeded, noise frame coding is performed. The frame energy in the coding parameters is the smoothed frame energy, which is quantized with 5 bits. For the quantized object of the line spectrum versus frequency parameter, a choice is made between the current spectral parameter and the average spectral parameter of the past 6 frames. If the distance between the current LPC (Linear Prediction Coding) parameter and the average LPC parameter of the past 6 frames is greater than 1.12202, the frequency parameter is quantized by the current line spectrum, otherwise the line language corresponding to the average LPC parameter of the past 6 frames is used. By quantifying the frequency parameters, it can be seen that this alternative is discontinuous for the stationary nature of the background noise. Using a 1-bit predictor with 5-bit + 4-bit 2-level vector quantization, the SID frame bit allocation is shown in Table 1: Table 1 G.729 AnnexB SID frame bit allocation
Figure imgf000007_0002
在解码端, 对于解码的帧能量釆用平滑方法计算每一帧的能量, 而对于线谱对频率参数则釆用直接拷贝最近一次的 SID线语对频率参 数。
Figure imgf000007_0002
At the decoding end, the energy of each frame is calculated by the smoothing method for the decoded frame energy, and the frequency parameter of the last SID line pair is directly copied for the line spectrum versus frequency parameter.
上述噪声编码方法仅仅适用于对窄带的噪声进行编码,对于宽带 的噪声则无能为力, 缺乏带宽的可伸缩性。  The above noise coding method is only suitable for encoding narrow-band noise, and is powerless for broadband noise, lacking bandwidth scalability.
现有技术中还存在一种 AMR-WB( Adaptive Multi-rate- Wideband, 宽带自适应多速率语音编码器 )为代表的 DTX/CNG噪声编码方法。 AMR-WB是基于 16kHz釆样、 20毫秒帧处理的, 对于在 VAD检测中判 断为语音信号的信号帧进行可变速率的编码,而对 VAD检测中判断为 背景噪声的输入信号则釆用一种固定的编码方式, 即每隔 7帧输出一 帧 35比特 SID帧信息。 SID编码参数主要是对背景噪声的能量及谱参 数进行编码。 在 AMR-WB的 SID帧中, 能量参数是当前噪声帧的对数 域能量:
Figure imgf000007_0001
A DTX/CNG noise coding method represented by an AMR-WB (Adaptive Multi-rate-Wideband) is also known in the prior art. The AMR-WB is based on a 16 kHz sample, 20 ms frame processing, and performs variable rate encoding for a signal frame judged to be a speech signal in VAD detection, and an input signal judged to be background noise in VAD detection. A fixed coding mode, that is, outputting a frame of 35-bit SID frame information every 7 frames. The SID coding parameters are mainly to encode the energy and spectral parameters of the background noise. In the SID frame of the AMR-WB, the energy parameter is the logarithmic domain energy of the current noise frame:
Figure imgf000007_0001
对于谱参数, AMR-WB中使用 ISF( Immittance Spectral Frequency, 即时频谱频率)参数来表示。 ISF参数是一个 16维的矢量, 由 16阶的 LPC ( Linear Prediction Coding, 线性预测编码) 系数转换而来。 在 AMR-WB的方案实现中第 j帧的能量及线谱对频率参数都是对 最近 8帧的平均值:
Figure imgf000008_0001
For spectral parameters, the AMR-WB is represented by the ISF (Immittance Spectral Frequency) parameter. The ISF parameter is a 16-dimensional vector that is transformed from a 16-order LPC (Linear Prediction Coding) coefficient. In the implementation of the AMR-WB scheme, the energy and line spectrum versus frequency parameters of the jth frame are the average of the last 8 frames:
Figure imgf000008_0001
ISF^i^ ^ -^ISFij -n) ( 3 ) 其中平均帧能量 用 6比特进行量化, 而谱参数的量化使用 ***量化技术, 将 16维的 ISF量化矢量分成 5个子矢量分别进行量化。 AMR-WB的 SID帧长度为 35比特 , 其比特分配如表 2所示:  ISF^i^ ^ -^ISFij -n) (3) The average frame energy is quantized by 6 bits, and the quantization of the spectral parameters is divided into 5 sub-vectors by using the split quantization technique to quantize the 16-dimensional ISF quantized vector. The SID frame length of the AMR-WB is 35 bits, and its bit allocation is as shown in Table 2:
表 2 AMR SID帧比特分配  Table 2 AMR SID frame bit allocation
Figure imgf000008_0002
在实现本发明的过程中, 发明人发现现有技术至少存在以下问 题: 上述方案中虽然可以对宽带的噪声帧进行编码, 但是, 由于对背 景噪声釆用固定的编码方式, 同样缺乏带宽的可伸缩性。 发明内容
Figure imgf000008_0002
In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art: Although the wideband noise frame can be encoded in the above solution, the same lack of bandwidth can be used due to the fixed coding mode for background noise. Scalability. Summary of the invention
本发明实施例提供一种背景噪声的编解码方法和装置,可以对背 景噪声进行具有带宽可伸缩性的编码。 本发明的实施例提供一种背景噪声的编解码方法, 包括: 当接收到的音频帧为噪声帧时,根据当前噪声帧的传输模式选择 需要编码的噪声帧; Embodiments of the present invention provide a coding and decoding method and apparatus for background noise, which can perform coding with bandwidth scalability for background noise. An embodiment of the present invention provides a coding and decoding method for background noise, including: when a received audio frame is a noise frame, selecting a noise frame that needs to be coded according to a transmission mode of the current noise frame;
对所述需要编码的噪声帧进行分层编码。  The noise frame that needs to be encoded is hierarchically coded.
当接收到的音频帧为分层编码的噪声帧时,根据当前噪声帧的传 输模式解码出所述噪声帧的编码参数;  When the received audio frame is a layered coded noise frame, the coding parameters of the noise frame are decoded according to a transmission mode of the current noise frame;
根据所述编码参数进行背景噪声重建。  Background noise reconstruction is performed according to the coding parameters.
本发明的实施例还提供一种编码器, 包括:  An embodiment of the present invention further provides an encoder, including:
选择单元, 用于当接收到的音频帧为噪声帧时, 根据当前帧的传 输模式选择需要编码的噪声帧, 并将选择的结果发送给编码单元; 编码单元, 用于根据所述选择单元发送的结果, 对需要进行编码 的噪声帧进行分层编码。  a selecting unit, configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, and send the selected result to the coding unit; and the coding unit is configured to send according to the selection unit As a result, the noise frames that need to be encoded are hierarchically encoded.
本发明的实施例还提供一种解码器, 包括:  An embodiment of the present invention further provides a decoder, including:
解码单元, 用于当接收到的音频帧为分层编码的噪声帧时, 根据 当前噪声帧的传输模式解码出所述噪声帧的编码参数;  a decoding unit, configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to a transmission mode of the current noise frame;
重建单元, 用于根据所述解码单元发送的所述噪声帧的编码参 数, 进行背景噪声重建。  And a reconstruction unit, configured to perform background noise reconstruction according to the coding parameter of the noise frame sent by the decoding unit.
本发明还提供一种背景噪声的编解码***, 包括:  The invention also provides a codec system for background noise, comprising:
编码器, 用于当接收到的音频帧为噪声帧时, 根据当前噪声帧的 传输模式选择需要编码的噪声帧,对所述需要编码的噪声帧进行分层 编码;  An encoder, configured to: when the received audio frame is a noise frame, select a noise frame that needs to be coded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded;
解码器,用于当从所述编码器接收到的音频帧为分层编码的噪声 帧时, 根据当前噪声帧的传输模式解码出所述噪声帧的编码参数, 根 据所述编码参数进行背景噪声重建。  a decoder, configured to: when the audio frame received from the encoder is a layered coded noise frame, decode an encoding parameter of the noise frame according to a transmission mode of the current noise frame, and perform background noise according to the coding parameter reconstruction.
与现有技术相比, 本发明的实施例具有以下优点: 通过使用本发明实施例提供的方法和装置,编码端根据当前噪声 帧的传输模式选择需要编码的噪声帧进行分层编码,可以对背景噪声 帧进行具有带宽可伸缩性的编码;解码端根据接收到的分层编码的噪 声帧的传输模式解码出噪声帧的编码参数, 进行背景噪声重建, 以实 现对背景噪声进行具有带宽可伸缩性的解码。 附图说明 Embodiments of the present invention have the following advantages over the prior art: By using the method and apparatus provided by the embodiments of the present invention, the encoding end selects the noise frame to be encoded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; The transmission mode of the layered coded noise frame is decoded to decode the coding parameters of the noise frame, and background noise reconstruction is performed to achieve bandwidth scalability for background noise. DRAWINGS
图 1 是现有技术中以 DTX方式压缩背景噪声的方法示意图; 图 2 是现有技术中 G.729.1编码器***示意图;  1 is a schematic diagram of a method for compressing background noise in a DTX manner in the prior art; FIG. 2 is a schematic diagram of a G.729.1 encoder system in the prior art;
图 3 是现有技术中 G.729.1解码器***示意图;  3 is a schematic diagram of a G.729.1 decoder system in the prior art;
图 4 是本发明实施例一的一种背景噪声的编码方法流程示意图; 图 5 是本发明实施例二的一种背景噪声的编码方法流程示意图; 图 6是本发明实施例二的 DTX噪声编码实现模块示意图; 图 7是本发明实施例二的背景噪声的 TDBWE编码器***示意 图;  4 is a schematic flowchart of a background noise encoding method according to Embodiment 1 of the present invention; FIG. 5 is a schematic flowchart of a background noise encoding method according to Embodiment 2 of the present invention; FIG. 6 is a DTX noise encoding according to Embodiment 2 of the present invention; FIG. 7 is a schematic diagram of a TDBWE encoder system for background noise according to Embodiment 2 of the present invention; FIG.
图 8是本发明实施例二的编码器***示意图;  8 is a schematic diagram of an encoder system according to Embodiment 2 of the present invention;
图 9是本发明实施例二的解码端的 CNG噪声解码模块示意图; 图 10是本发明实施例二的利用重建出的低带编码参数恢复低带 信号分量的方法示意图;  9 is a schematic diagram of a CNG noise decoding module of a decoding end according to Embodiment 2 of the present invention; FIG. 10 is a schematic diagram of a method for recovering a low-band signal component by using a reconstructed low-band coding parameter according to Embodiment 2 of the present invention;
图 11是本发明实施例二的利用重建出的高带编码参数恢复高带 信号分量的方法示意图;  11 is a schematic diagram of a method for recovering a high-band signal component by using a reconstructed high-band coding parameter according to Embodiment 2 of the present invention;
图 12是本发明实施例二的解码器***示意图;  12 is a schematic diagram of a decoder system according to Embodiment 2 of the present invention;
图 13 是本发明实施例三的一种背景噪声的编码方法流程示意 图;  13 is a schematic flow chart of a method for encoding background noise according to Embodiment 3 of the present invention;
图 14是本发明实施例三的噪声帧的编码端***示意图; 图 15是本发明实施例三的噪声帧的解码端***示意图; 图 16是本发明实施例五的一种编码器示意图; 14 is a schematic diagram of a coding end system of a noise frame according to Embodiment 3 of the present invention; FIG. 15 is a schematic diagram of a decoding end system of a noise frame according to Embodiment 3 of the present invention; 16 is a schematic diagram of an encoder according to Embodiment 5 of the present invention;
图 17是本发明实施例六的一种解码器示意图。 具体实施方式  FIG. 17 is a schematic diagram of a decoder according to Embodiment 6 of the present invention. detailed description
下面结合附图和实施例,对本发明的具体实施方式作进一步详细 描述。  The specific embodiments of the present invention are further described in detail below with reference to the drawings and embodiments.
本发明的实施例一中, 一种背景噪声的编解码方法如图 4所示, 具体步骤如下:  In the first embodiment of the present invention, a method for encoding and decoding background noise is shown in FIG. 4, and the specific steps are as follows:
步骤 S401、 在编码端, 对输入的音频帧利用 VAD检测, 判断当 前音频帧的类型, 如果当前音频帧为语音帧, 则按照语音帧编码算法 对音频帧进行编码, 如果当前帧为噪声帧且前一帧为语音帧(即当前 发生了从语音帧到噪声帧的切换), 转步骤 S402。  Step S401: At the encoding end, use VAD detection on the input audio frame to determine the type of the current audio frame. If the current audio frame is a voice frame, the audio frame is encoded according to the voice frame coding algorithm, if the current frame is a noise frame and The previous frame is a speech frame (ie, switching from a speech frame to a noise frame is currently occurring), and the flow proceeds to step S402.
步骤 S402、 如果当前发生了从语音帧到噪声帧的切换, 还可能 首先进入拖尾阶段。  Step S402: If the switching from the speech frame to the noise frame currently occurs, it is also possible to first enter the tailing phase.
具体的, 如果当前发生了从语音帧到噪声帧的切换, 还可能首先 进入拖尾阶段, 拖尾阶段具体为: 在发生从语音帧到噪声帧切换后的 N帧时间内, 仍然按照语音帧的编码算法对当前噪声帧进行编码, 但 是降低了编码速度。  Specifically, if the switching from the voice frame to the noise frame occurs, it may first enter the tailing phase, and the tailing phase is specifically: in the N frame time after the switching from the voice frame to the noise frame occurs, the voice frame is still followed. The encoding algorithm encodes the current noise frame, but reduces the encoding speed.
步骤 S403、 根据传输模式选择需要编码的噪声帧。  Step S403: Select a noise frame to be encoded according to the transmission mode.
对当前帧可以釆用两种传输模式进行编码传输: 非连续传输 ( DTX )模式和连续传输模式。 如果釆用非连续传输模式, 判断是否 需要对当前帧进行编码, 如果判断需要对当前噪声帧进行编码, 则选 择当前帧为需要编码的噪声帧, 否则对当前帧不做任何处理; 如果釆 用连续传输模式, 则直接选择当前帧为需要编码的噪声帧, 即对所有 的接收到的噪声帧进行编码。  Two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the discontinuous transmission mode is used, it is determined whether the current frame needs to be encoded. If it is determined that the current noise frame needs to be encoded, the current frame is selected as the noise frame to be encoded, otherwise no processing is performed on the current frame; In the continuous transmission mode, the current frame is directly selected as the noise frame to be encoded, that is, all the received noise frames are encoded.
步骤 S404、 对需要编码的噪声帧进行窄带核心层编码。 具体的, 获取需要编码的噪声帧的低带信号分量, 对低带信号分 量进行核心层参数编码。获取需要编码的噪声帧的低带信号分量方法 包括: 对需要编码的噪声帧进行分带滤波, 将噪声帧分成低带信号分 量和高带信号分量; 或将全频带的噪声信号进行高通滤波, 并进行降 釆样处理, 得到低带信号分量。 Step S404, performing narrowband core layer coding on the noise frame that needs to be encoded. Specifically, the low-band signal component of the noise frame that needs to be encoded is obtained, and the core layer parameter encoding is performed on the low-band signal component. The method for obtaining a low-band signal component of a noise frame to be encoded includes: performing band-sampling on a noise frame to be coded, dividing the noise frame into a low-band signal component and a high-band signal component; or performing high-pass filtering on the noise signal of the full-band, And the sample processing is performed to obtain a low-band signal component.
对获取的低带信号分量进行窄带核心层编码的方法具体包括:对 噪声帧的低带信号分量进行线性预测分析,得到线性预测系数和信号 能量; 将线性预测系数转化成谱参数, 对谱参数进行矢量量化, 得到 量化的谱参数; 将信号能量进行对数量化, 得到帧能量; 将量化的谱 参数及帧能量作为噪声帧的窄带核心层参数。  The method for performing narrowband core layer coding on the acquired lowband signal component comprises: linearly predicting and analyzing a lowband signal component of the noise frame to obtain a linear prediction coefficient and a signal energy; converting the linear prediction coefficient into a spectral parameter, and using the spectral parameter Perform vector quantization to obtain quantized spectral parameters; quantize the signal energy to obtain frame energy; and use the quantized spectral parameters and frame energy as narrow-band core layer parameters of the noise frame.
步骤 S405、 如果还需要进行扩展层编码, 则对窄带核心层编码 后的噪声帧进行扩展层编码。  Step S405: If the enhancement layer coding is further required, the noise frame encoded by the narrowband core layer is subjected to extension layer coding.
具体的, 对噪声帧进行窄带增强层编码, 即对窄带核心层中谱参 数的量化误差和信号能量的量化误差进行量化。对噪声帧进行宽带扩 展层编码, 即对噪声帧的高带信号分量进行扩展参数编码。扩展层可 以是一层,也可以是多层。宽带扩展层包括宽带核心层和宽带增强层。  Specifically, the noise frame is subjected to narrowband enhancement layer coding, that is, the quantization error of the spectral parameters in the narrowband core layer and the quantization error of the signal energy are quantized. Broadband spreading layer coding is performed on the noise frame, that is, the high-band signal component of the noise frame is subjected to extended parameter coding. The extension layer can be one layer or multiple layers. The broadband extension layer includes a broadband core layer and a broadband enhancement layer.
对噪声帧进行宽带扩展层编码具体包括:获取高带信号分量的时 域包络和频域包络, 将频域包络各维分量减去量化后的时域包络, 得 到的矢量***成多个子矢量, 并分别进行量化, 得到宽带扩展层编码 参数。  The broadband extended layer coding of the noise frame specifically includes: acquiring a time domain envelope and a frequency domain envelope of the high band signal component, and subtracting the quantized time domain envelope from each dimension component of the frequency domain envelope, and the obtained vector is split into A plurality of sub-vectors are separately quantized to obtain a wideband extension layer coding parameter.
或者釆用 TDAC 编码算法对噪声帧的低带残差信号分量及高带 信号分量进行宽带扩展层编码具体为: 将低带信号分量进行重建, 将 重建的低带信号分量升釆样并进行频谱扩展, 得到重建的宽带信号, 将原始的宽带信号与重建的宽带信号的残差进行 MDCT变换, 对得 到的 MDCT系数进行量化编码, 即可得到宽带扩展层参数。 步骤 S406、 编码完成后, 传输编码后的噪声帧。 Or using the TDAC coding algorithm to perform wideband extension layer coding on the low-band residual signal component and the high-band signal component of the noise frame, specifically: reconstructing the low-band signal component, ascending the reconstructed low-band signal component, and performing spectrum The extended wideband signal is obtained, and the residual of the original wideband signal and the reconstructed wideband signal is MDCT transformed, and the obtained MDCT coefficients are quantized and encoded, and the broadband extended layer parameters are obtained. Step S406: After the encoding is completed, the encoded noise frame is transmitted.
步骤 S407、 在解码端, 从接收到的编码码流中解码出编码参数, 判断当前音频帧的类型, 如果当前音频帧为语音帧, 则按照语音帧解 码算法对音频帧进行解码, 否则, 转步骤 S408。  Step S407: Decode the encoding parameter from the received encoded code stream at the decoding end, and determine the type of the current audio frame. If the current audio frame is a voice frame, decode the audio frame according to the voice frame decoding algorithm. Otherwise, turn Step S408.
步骤 S408、 如果接收到的音频帧为噪声帧, 根据当前噪声帧的 传输模式解码出噪声帧的编码参数。  Step S408: If the received audio frame is a noise frame, the coding parameters of the noise frame are decoded according to the transmission mode of the current noise frame.
具体的, 当前噪声帧的传输模式为非连续传输时, 解码出接收到 的噪声帧的编码参数, 对于未传输的噪声帧, 则根据以前接收到的噪 声帧或在拖尾阶段緩存的编码参数解码出当前噪声帧的编码参数。  Specifically, when the transmission mode of the current noise frame is discontinuous transmission, the coding parameters of the received noise frame are decoded, and for the untransmitted noise frame, according to the previously received noise frame or the coding parameter buffered in the trailing phase. The encoding parameters of the current noise frame are decoded.
当前噪声帧的传输模式为连续传输时,则对接收到的噪声帧解码 出编码参数。  When the transmission mode of the current noise frame is continuous transmission, the coding parameters are decoded for the received noise frame.
步骤 S409、 根据解码出的编码参数进行背景噪声重建。  Step S409: Perform background noise reconstruction according to the decoded coding parameters.
具体的, 当接收到的噪声帧只包含窄带核心层或既包括窄带核心 层又包括窄带增强层时,使用重建出的谱参数计算出合成滤波器的系 数, 使用高斯随机噪声作为激励, 通过计算出的合成滤波器进行合成 滤波,并使用重建出的能量参数进行时域整形,重建出背景噪声信号; 或者对低带编码参数进行 CELP解码, 得到解码出的低带信号分量, 将低带信号分量升釆样为全频带信号并进行频谱扩展,重建出背景噪 声信号。  Specifically, when the received noise frame includes only the narrowband core layer or both the narrowband core layer and the narrowband enhancement layer, the coefficients of the synthesis filter are calculated using the reconstructed spectral parameters, and Gaussian random noise is used as the excitation, and the calculation is performed. The synthesized filter is synthesized and filtered, and the reconstructed energy parameter is used for time domain shaping to reconstruct the background noise signal; or the low band coding parameter is CELP decoded to obtain the decoded low band signal component, and the low band signal is obtained. The component is sampled as a full-band signal and spectrally spread to reconstruct a background noise signal.
当接收到的噪声帧还包含宽带扩展层时, 可以釆用 TDBWE解码 算法对噪声帧重建出背景噪声信号; 或釆用 TDAC解码算法对噪声帧 重建出的背景噪声信号。  When the received noise frame further includes a wideband extension layer, the TDBWE decoding algorithm may be used to reconstruct the background noise signal from the noise frame; or the background noise signal reconstructed from the noise frame by the TDAC decoding algorithm may be used.
釆用 TDB WE解码算法对噪声帧重建出背景噪声信号的方法具体 为: 使用重建出的谱参数计算出合成滤波器的系数, 使用高斯随机噪 声作为激励, 通过计算出的合成滤波器进行合成滤波, 并使用重建出 的能量参数进行时域整形, 得到背景噪声信号的低带信号分量; 使用 高斯随机噪声作为激励源,利用重建出的高带编码参数对激励源进行 时域整形和频域整形, 重建出背景噪声信号的高带信号分量; 将重建 出的低带信号分量和高带信号分量进行 QMF合成滤波,得到背景噪声 信号。 The method for reconstructing the background noise signal from the noise frame by using the TDB WE decoding algorithm is as follows: Calculate the coefficient of the synthesis filter using the reconstructed spectral parameters, use Gaussian random noise as the excitation, and perform synthesis filtering through the calculated synthesis filter. And use to rebuild The energy parameters are time domain shaped to obtain the low-band signal component of the background noise signal. Using Gaussian random noise as the excitation source, the reconstructed high-band coding parameters are used for time domain shaping and frequency domain shaping of the excitation source to reconstruct the background noise. The high-band signal component of the signal; performing QMF synthesis filtering on the reconstructed low-band signal component and the high-band signal component to obtain a background noise signal.
釆用 TDAC解码算法对噪声帧建出背景噪声信号的方法具体为: 对低带编码参数通过 CELP解码算法解码出低带信号分量, 将低带信 号分量升釆样并进行频语扩展, 得到全频带信号; 对重建的高带编码 参数进行反量化和反 MDCT变换, 得到残差信号, 与全频带信号进行 合并, 得到宽带的背景噪声信号。  The method of constructing the background noise signal for the noise frame by using the TDAC decoding algorithm is as follows: Decoding the low-band signal component by the CELP decoding algorithm for the low-band coding parameter, raising the low-band signal component and performing frequency-spreading to obtain the whole The frequency band signal is subjected to inverse quantization and inverse MDCT transform on the reconstructed high-band coding parameters to obtain a residual signal, which is combined with the full-band signal to obtain a broadband background noise signal.
本发明的实施例二中, 以对高带信号分量釆用 TDBWE编码算法 进行编码为例, 一种背景噪声的编解码方法如图 5所示, 具体步骤如 下:  In the second embodiment of the present invention, the high-band signal component is encoded by the TDBWE encoding algorithm as an example, and a background noise encoding and decoding method is shown in FIG. 5, and the specific steps are as follows:
步骤 S501、 在编码端, 输入一帧长度为 20ms、 釆样率为 16000Hz 的数据, 对输入的音频帧利用 VAD检测, 判断当前帧的类型, 如果当 前帧是语音帧,转步骤 S502,如果当前帧为噪声帧且前一帧为语音帧 (即当前发生了从语音帧到噪声帧的切换), 转步骤 S503。  Step S501: At the encoding end, input a data length of 20 ms and a sampling rate of 16000 Hz, and use VAD detection on the input audio frame to determine the type of the current frame. If the current frame is a voice frame, go to step S502, if current The frame is a noise frame and the previous frame is a voice frame (ie, the switching from the voice frame to the noise frame currently occurs), and the process goes to step S503.
具体的, 本实施例中使用的全速率语音帧的帧结构如表 3所示。  Specifically, the frame structure of the full-rate speech frame used in this embodiment is as shown in Table 3.
表 3 全速率语音帧的帧结构  Table 3 Frame structure of full-rate speech frames
Figure imgf000014_0001
固定码本
Figure imgf000014_0001
Fixed codebook
13 13 13 13 52 索引  13 13 13 13 52 index
固定码本 Fixed codebook
口 4 4 4 4 16 付 T  Mouth 4 4 4 4 16 pay T
码本增益 Codebook gain
3 3 3 3 12 (第一级)  3 3 3 3 12 (first level)
码本增益 Codebook gain
4 4 4 4 16 (第二级)  4 4 4 4 16 (second level)
8kb/s核  8kb/s core
160  160
心层总计 Total heart layer
层 2 - 窄带增强层 (窄带嵌入式 CELP)  Layer 2 - Narrowband Enhancement Layer (Narrowband Embedded CELP)
第二级固 Second stage solid
定码本索 13 13 13 13 52 引 Fixing code 13 13 13 13 52
第二级固 Second stage solid
定码本符 4 4 4 4 16 号 Fixed code 4 4 4 4 16
第二级固 Second stage solid
定码本增 3 2 3 2 10 纠错位 Fixed code increase 3 2 3 2 10 error correction bit
(分类信 1 1 2 息)  (Classification letter 1 1 2 interest)
12kb/s增  12kb/s increase
80  80
强层总计 Strong layer total
层 3 - 宽带增强层 (TDBWE)  Layer 3 - Broadband Enhancement Layer (TDBWE)
时域包络 Time domain envelope
5 5 均值  5 5 mean
时域包络 Time domain envelope
7+7 14 ***矢量  7+7 14 split vector
频域包络 Frequency domain envelope
5+5+4 14 ***矢量  5+5+4 14 split vector
纠错位  Error correction bit
(相位信 7 7 息)  (phase letter 7 7 interest)
14kb/s增  14kb/s increase
40  40
强层总计 层 4至层 12 _ 宽带增强层 (TDAC) 纠错位 Strong layer total Layer 4 to Layer 1 2 _ Wideband Enhancement Layer (TDAC) Correction Bit
(能量信 5 5 息)  (Energy letter 5 5 interest)
MDCT归  MDCT return
一 4 4 化因子  One 4 4 factor
高带谱包  High band
nbits HB nbits HB 络  Nbits HB nbits HB network
低带谱包  Low band
nbits LB nbits LB 络  Nbits LB nbits LB network
精细结构 nbits VQ = 351 - nbits HB - nbits LB nbits VQ Fine structure nbits VQ = 351 - nbits HB - nbits LB nbits VQ
16~32kb/ 16~32kb/
s增强层 360  s enhancement layer 360
总计  Total
总计 640 本实施例中使用的全速率噪声帧的帧结构如表 4所示:  Total 640 The frame structure of the full rate noise frame used in this embodiment is shown in Table 4:
表 4 全速率噪声帧的帧结构  Table 4 Frame structure of full-rate noise frame
Figure imgf000016_0001
步骤 S502、如果当前帧是语音帧, 则按照语音帧编码算法对当前 帧进行编码, 且最高可编码出 32kb/s的编码码流。
Figure imgf000016_0001
Step S502: If the current frame is a voice frame, the current frame is encoded according to a voice frame coding algorithm, and a coded stream of up to 32 kb/s can be encoded.
步骤 S503、如果当前发生了从语音帧到噪声帧的切换,还可以首 先进入拖尾阶段。  Step S503: If the switching from the voice frame to the noise frame occurs currently, the smear phase may also be entered first.
具体的, 拖尾阶段持续时间为 N帧, 即在发生从语音帧到噪声帧 切换后的 N帧时间内, 仍然按照语音帧的编码算法对当前噪声帧进行 编码, 但是要降低编码速度。 例如, 如果切换前语音帧的编码速率为 进行编码, 如果切换前语音帧的编码速率为 8kb/s或者 12kb/s, 那么进 束。拖尾阶段同时还可以完成对噪声参数的学习训练, 即緩存拖尾阶 段低带信号分量的自相关函数、低带编码参数和高带编码参数等, 用 于对后续噪声帧的编码进行初始化。  Specifically, the trailing phase duration is N frames, that is, in the N frame time after the switching from the voice frame to the noise frame occurs, the current noise frame is still encoded according to the encoding algorithm of the voice frame, but the encoding speed is reduced. For example, if the encoding rate of the speech frame before switching is encoded, if the encoding rate of the speech frame before switching is 8 kb/s or 12 kb/s, then the packet is advanced. At the same time, the learning and training of the noise parameters can be completed at the same time, that is, the autocorrelation function of the low-band signal component of the buffering tail stage, the low-band coding parameter and the high-band coding parameter are used for initializing the encoding of the subsequent noise frame.
拖尾阶段结束后, 对当前帧可以釆用两种传输模式进行编码传 输: 非连续传输(DTX )模式和连续传输模式。 如果釆用非连续传输 模式对当前帧进行编码传输, 则执行步骤 S504, 如果釆用连续传输 模式, 则对所有接收到的噪声帧进行编码, 直接执行步骤 S505〜步骤 S507。  After the end of the smearing phase, two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the current frame is encoded and transmitted in the discontinuous transmission mode, step S504 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S505 to S507 are directly performed.
步骤 S504、判断是否需要对当前噪声帧进行编码,如果需要对当 前噪声帧进行编码, 转步骤 S505 , 否则不对当前帧进行任何处理。  Step S504: Determine whether the current noise frame needs to be encoded. If the current noise frame needs to be encoded, go to step S505, otherwise no processing is performed on the current frame.
具体的, 可以利用特定准则确定 DTX的策略, 判断是否需要对当 前帧进行编码, 即计算当前噪声帧的谱、 能量相对于长时平均谱、 能 量(即之前緩存的编码参数的平均谱、 能量)的失真, 若失真超过特 定阔值, 则对该噪声帧进行编码, 否则不对当前帧进行任何处理。  Specifically, the DTX policy may be determined by using specific criteria to determine whether the current frame needs to be encoded, that is, the spectrum of the current noise frame, the energy relative to the long-term average spectrum, and the energy (ie, the average spectrum of the previously buffered coding parameters, energy). Distortion, if the distortion exceeds a certain threshold, the noise frame is encoded, otherwise no processing is performed on the current frame.
非连续传输模式下, 对噪声帧进行编码的实现模块示意图如图 6 所示。 In the discontinuous transmission mode, the implementation module for encoding the noise frame is shown in Figure 6. Shown.
步骤 S505、 对当前噪声帧进行窄带核心层编码。  Step S505: Perform narrowband core layer coding on the current noise frame.
具体的, 本实施例中窄带核心层参数编码可以釆用 CELP模型, 对于需要进行 SID编码传输的背景噪声帧进行 QMF分带滤波, 按频率 高低分成若干个子带, 本实施例取最简单的情况, 将背景噪声帧分成 两个子带: 低带信号分量 和高带信号分量¾(«) , 低带信号分量 频率范围为 0~4000Hz, 高带信号分量频率范围为 4000~8000Hz。 对低 带信号分量 (")进行加窗、 求取自相关函数和 LPC分析, 得到 LPC 系数 O (其中 = 1, 2,···,Μ )和信号能量 E, 通常, 会对自相关函数进 行适当的平滑处理再进行 LPC分析, 以得到平滑的 LPC系数 将 LPC系数 转化成谱参数 Ω = [^;., = 1, 2,···,Μ] , 其中 Μ为线性预测的 阶数, 然后对谱参数 Ω = [ί¾, = 1,2,···,Μ]进行矢量量化, 得到量化的谱 参数 0 ;将信号能量 进行对数量化,得到帧能量 ^;量化的谱参数 及帧能量 作为背景噪声的窄带核心层参数 Ε\。  Specifically, in the embodiment, the narrowband core layer parameter coding may use the CELP model, and perform QMF banding filtering on the background noise frame that needs to be SID coded, and divide into several subbands according to the frequency. This embodiment takes the simplest case. The background noise frame is divided into two sub-bands: a low-band signal component and a high-band signal component 3⁄4 («), a low-band signal component frequency range of 0 to 4000 Hz, and a high-band signal component frequency range of 4000 to 8000 Hz. Windowing the low-band signal component ("), obtaining the autocorrelation function and LPC analysis, and obtaining the LPC coefficient O (where = 1, 2, ···, Μ ) and the signal energy E, usually, the autocorrelation function Perform appropriate smoothing and then perform LPC analysis to obtain smooth LPC coefficients to convert LPC coefficients into spectral parameters Ω = [^;., = 1, 2,···, Μ], where Μ is the order of linear prediction Then, vector quantization is performed on the spectral parameter Ω = [ί3⁄4, = 1, 2, ···, Μ] to obtain a quantized spectral parameter 0; the signal energy is quantized to obtain a frame energy ^; the quantized spectral parameter and The frame energy is used as the narrowband core layer parameter of background noise.
步骤 S506、如果需要进行扩展层参数编码, 则对窄带核心层编码 后的噪声帧进行扩展层参数编码。  Step S506: If the extension layer parameter coding is needed, the extension layer parameter coding is performed on the noise frame encoded by the narrowband core layer.
具体的, 如果需要进行窄带增强层编码, 则对窄带核心层中谱参 数的量化误差和能量参数的量化误差进一步量化,即如果量化前的谱 参数为 Ω , 在核心层量化后的谱参数为 , 则在窄带增强层中, 对 进行量化, 量化结果是增强层中谱量化码本中的索引值; 对于 能量参数, 也釆用类似的方法, 对 £- 进行量化, 即可得到窄带增 强层编码的噪声帧。  Specifically, if narrowband enhancement layer coding is required, the quantization error of the spectral parameters in the narrowband core layer and the quantization error of the energy parameter are further quantized, that is, if the spectral parameter before quantization is Ω, the spectral parameter after quantization in the core layer is Then, in the narrowband enhancement layer, the pair is quantized, and the quantization result is the index value in the spectral quantization codebook in the enhancement layer; for the energy parameter, a similar method is also used to quantize the £- to obtain the narrowband enhancement layer. Encoded noise frame.
如果还需要进行宽带核心层编码,对经过窄带增强层编码的噪声 帧进行扩展参数编码。具体的,将背景噪声帧分解出的高带信号分量, 釆用 TDBWE编码算法对高带信号分量进行扩展参数编码, 如图 7所 示, 即首先分别计算高带信号分量的时域包络或频域包络。 时域包络 的计算方法如公式(1 ) 所示:
Figure imgf000019_0001
If wideband core layer coding is also required, the noise frame encoded by the narrowband enhancement layer is subjected to extended parameter coding. Specifically, the high-band signal component is decomposed from the background noise frame, and the TDBWE encoding algorithm is used to perform extended parameter encoding on the high-band signal component, as shown in FIG. 7 That is, the time domain envelope or the frequency domain envelope of the high band signal component is first calculated separately. The calculation method of the time domain envelope is as shown in formula (1):
Figure imgf000019_0001
其中 I为时域包络的个数。 Where I is the number of time domain envelopes.
频域包络的计算方法如下所示: 首先使用一个 128抽头的汉宁窗 对高带信号分量进行加窗, 窗函数如公式(2)所示:  The calculation method of the frequency domain envelope is as follows: First, a high-band signal component is windowed using a 128-tap Hanning window. The window function is as shown in equation (2):
Figure imgf000019_0002
Figure imgf000019_0002
加窗后的高带信号分量为:  The high-band signal component after windowing is:
s^ ) = sHB(n)-wF(n + 3Y) i -— 31,···,96 (3) 对加窗后的信号进行 128点的 FFT ( Fast Fourier Transform, 快速 傅立叶变换), 使用多项结构实现: s^ ) = s HB (n)-w F (n + 3Y) i -_ 31,···,96 (3) Perform 128-point FFT (Fast Fourier Transform) on the windowed signal , using multiple structure implementations:
S B (k) = FFT64 [sH w B (n) - sH w B (n + 64)) , Λ = 0,···,63, « = -31,···, 32 (4) 利用计算出的 FFT系数求取加权频域包络, 在 G.729.1中因为仅需要对 全带的 4000~7000Hz频段进行编码, 因此对于高带信号分量而言, 仅 需计算其 0~3000Hz频段(对应于前 25个 FFT系数)的加权频域包络即 -2j)- 7 = 0,···, J-l (5)
Figure imgf000019_0003
S B (k) = FFT 64 [s H w B (n) - s H w B (n + 64)) , Λ = 0,···,63, « = -31,···, 32 (4 Using the calculated FFT coefficients to obtain the weighted frequency domain envelope, in G.729.1, since only the 4000-7000 Hz band of the full band needs to be encoded, for the high band signal component, only 0~3000 Hz needs to be calculated. The weighted frequency domain envelope of the frequency band (corresponding to the first 25 FFT coefficients) is -2j) - 7 = 0, ···, Jl (5)
Figure imgf000019_0003
其中 j为频域包络的个数。 本发明实施例也可以应用到对高带任 意频段求取频域包络, 频域包络的个数也可以为大于 0的任意值, 因 而不仅仅局限于 G.729.1中的应用。 因为对于背景噪声的编码而言,人耳并不能对背景噪声的时域包 络区分的很精细, 因此不需要像语音帧那样分成 16个时域包络, 而是 仅需要计算出整帧的平均时域包络即可, 如公式(6 )所示: Where j is the number of frequency domain envelopes. The embodiment of the present invention can also be applied to obtain a frequency domain envelope for any band of a high band, and the number of frequency domain envelopes can also be any value greater than 0, and thus is not limited to the application in G.729.1. Because the encoding of the background noise, the human ear can not distinguish the time domain envelope of the background noise very finely, so it does not need to be divided into 16 time domain envelopes like a speech frame, but only needs to calculate the entire frame. The average time domain envelope can be, as shown in equation (6):
Tsm = =。,···,/— 1 ( 6 )
Figure imgf000020_0001
T sm = =. ,···,/— 1 ( 6 )
Figure imgf000020_0001
得到的时域包络使用长度为 5比特、量化步长为 3dB的均匀量化器 进行量化, 量化后的时域包络用 7 ^来表示, 然后将 J维的频域包络各 维分量减去 7 ^后的矢量***成 3个子矢量, 分别进行量化; 将量化后 的时域包络和频域包络通过复用器输出,得到宽带扩展层编码的噪声 帧。  The obtained time domain envelope is quantized using a uniform quantizer with a length of 5 bits and a quantization step size of 3 dB. The quantized time domain envelope is represented by 7 ^, and then the dimensional components of the J-dimensional frequency domain envelope are reduced. The vector after 7 ^ is split into 3 sub-vectors, and quantized separately; the quantized time domain envelope and the frequency domain envelope are output through the multiplexer to obtain a noise frame encoded by the wideband extension layer.
步骤 S507、 编码完成后, 传输编码后的噪声帧。  Step S507: After the encoding is completed, the encoded noise frame is transmitted.
本发明实施例的编码器***如图 8所示。  The encoder system of the embodiment of the present invention is as shown in FIG.
以上步骤为本实施例中编码端对噪声帧的处理流程,对应上述编 码过程, 相应的解码流程的具体步骤如下:  The above steps are the processing flow of the code side to the noise frame in the embodiment, and corresponding to the above coding process, the specific steps of the corresponding decoding process are as follows:
步骤 S508、 在解码端, 从接收到的编码码流中解码出编码参数, 判断出当前帧的类型, 如果当前帧为语音帧, 则按照语音帧解码算法 对音频帧进行解码, 如果当前帧为噪声帧, 转步骤 S509。  Step S508: Decode the encoding parameter from the received encoded code stream at the decoding end, and determine the type of the current frame. If the current frame is a voice frame, decode the audio frame according to the voice frame decoding algorithm, if the current frame is The noise frame is changed to step S509.
具体的, 由于在码流的传输过程中,媒体网关可能会根据信道状 况从外层到内层逐层丟弃一些编码比特, 以适应信道的传输能力, 因 此, 即使编码器发送的是全速率的编码帧, 在解码器可能也无法接收 到全速率的码流。 在解码端, 解码器只能根据实际接收到的码流, 按 照相应的速率进行解码。  Specifically, during the transmission of the code stream, the media gateway may discard some coded bits from the outer layer to the inner layer according to channel conditions to adapt to the channel transmission capability, so even if the encoder sends the full rate The encoded frame, the decoder may also be unable to receive the full rate stream. At the decoding end, the decoder can only decode according to the actual received code stream according to the corresponding rate.
步骤 S509、 重建接收到的噪声帧的编码参数, 并根据噪声帧的编 码参数重建出背景噪声信号。  Step S509: Reconstruct the coding parameters of the received noise frame, and reconstruct a background noise signal according to the coding parameters of the noise frame.
如果釆用了非连续传输的模式, 在噪声段, 解码器只能断续地接 收到 SID帧, 对于接收到的噪声帧重建出编码参数, 对于没有传输的 帧,通过以前接收到的噪声帧或者在拖尾阶段学习到的噪声参数重建 出当前帧的编码参数, 再进行背景噪声重建。 非连续传输模式下的解 码模块如图 9所示。 If the mode of discontinuous transmission is used, in the noise segment, the decoder can only be connected intermittently. Receiving the SID frame, reconstructing the encoding parameter for the received noise frame, and reconstructing the encoding parameter of the current frame by the previously received noise frame or the noise parameter learned in the trailing phase for the frame that is not transmitted, and then performing the background Noise reconstruction. The decoding module in the discontinuous transmission mode is shown in FIG.
如果釆用连续传输模式 ,则对所有接收到的噪声帧重建出编码参 数, 进行背景噪声重建。  If the continuous transmission mode is used, the coding parameters are reconstructed for all received noise frames for background noise reconstruction.
当接收到的噪声帧只包含窄带核心层时,需要计算出窄带核心层 的编码参数 =[0,司, 利用重建出的谱参数 Ω构造出滤波器, 其中 该滤波器利用高斯随机噪声作为激励信号,对窄带核心层的编码参数 进行滤波,对滤波后的窄带核心层的编码参数再使用解码出的能量参 数 E进行整形, 即可重建出背景噪声的低带信号分量, 如图 10所示。 如果解码器还要求输出宽带的信号, 则将高带信号分量置为 0, 利用 QMF合成滤波器与重建出的低带信号分量合成宽带信号输出即可,如 果解码器不要求输出宽带信号,则将重建出的低带信号分量直接输出 即可。  When the received noise frame only contains the narrowband core layer, it is necessary to calculate the coding parameter of the narrowband core layer = [0, Division, construct the filter using the reconstructed spectral parameter Ω, wherein the filter uses Gaussian random noise as the excitation The signal is used to filter the coding parameters of the narrow-band core layer, and the encoded parameters of the filtered narrow-band core layer are then shaped by using the decoded energy parameter E, thereby reconstructing the low-band signal component of the background noise, as shown in FIG. . If the decoder also requires to output a wideband signal, the highband signal component is set to 0, and the wideband signal output can be synthesized by using the QMF synthesis filter and the reconstructed lowband signal component. If the decoder does not require the output of the wideband signal, then The reconstructed low-band signal component can be directly output.
当接收到的噪声帧还包含窄带增强层时,由于窄带增强层只是对 核心层谱参数和能量参数量化精度的增强, 并未增加新的参数, 因此 对解码获得的谱参数和能量参数,使用与仅有窄带核心层码流类似的 解码过程, 即可获得重建的宽带或者窄带背景噪声信号。  When the received noise frame further includes a narrowband enhancement layer, since the narrowband enhancement layer only enhances the quantization precision of the core layer spectral parameters and the energy parameters, no new parameters are added, so the spectral parameters and energy parameters obtained by decoding are used. A reconstructed wideband or narrowband background noise signal can be obtained by a decoding process similar to that of a narrowband core layer only stream.
当接收到的噪声帧还包含宽带核心层时,则重建出噪声帧的低带 编码参数和高带编码参数,利用重建的低带编码参数或重建的低带信 号分量重建出低带参数 (如基音延迟、 固定码本增益和自适应码本增 益等 ),对重建出的低带参数利用高斯随机噪声进行整形得到激励源, 利用重建出的高带编码参数 = [ 对激励源进行时域整形和 频域整形, 即可得到噪声帧的高带信号分量, 将重建的高带信号分量 和低带信号分量进行 QMF滤波组合, 即可重建出全带的背景噪声帧, 如图 11所示。 When the received noise frame further includes a broadband core layer, the low-band coding parameters and high-band coding parameters of the noise frame are reconstructed, and the low-band parameters are reconstructed by using the reconstructed low-band coding parameters or the reconstructed low-band signal components (eg, Pitch delay, fixed codebook gain and adaptive codebook gain, etc., the reconstructed low-band parameters are shaped by Gaussian random noise to obtain the excitation source, and the reconstructed high-band coding parameters are used = [Time domain shaping of the excitation source] And frequency domain shaping, the high-band signal component of the noise frame can be obtained, and the reconstructed high-band signal component will be obtained. Combined with the low-band signal component for QMF filtering, the full-band background noise frame can be reconstructed, as shown in Figure 11.
本发明实施例的解码器***如图 12所示。  The decoder system of the embodiment of the present invention is as shown in FIG.
本发明的实施例三中, 以釆用 TD AC编码算法对高带信号分量进 行编码为例,一种背景噪声的编解码方法如图 13所示,具体步骤如下: 步骤 S1301、 在编码端, 对输入的音频帧利用 VAD检测, 判断当 前帧的类型, 如果当前帧是语音帧, 转步骤 S1302, 如果当前帧为噪 声帧且前一帧为语音帧 (即当前发生了从语音帧到噪声帧的切换), 转步骤 S1303。  In the third embodiment of the present invention, the TD AC encoding algorithm is used to encode the high-band signal component as an example, and a background noise encoding and decoding method is shown in FIG. 13, and the specific steps are as follows: Step S1301, at the encoding end, Using the VAD detection on the input audio frame, determining the type of the current frame, if the current frame is a voice frame, go to step S1302, if the current frame is a noise frame and the previous frame is a voice frame (ie, the current frame from the voice frame to the noise frame occurs) Switching), go to step S1303.
本实施例中使用的全速率噪声帧的帧结构如表 5所示: 表 5 噪声帧的比特分配  The frame structure of the full-rate noise frame used in this embodiment is as shown in Table 5: Table 5 Bit allocation of noise frames
Figure imgf000022_0001
步骤 S1302、 如果当前帧是语音帧, 则按照语音帧编码算法对当 前帧进行编码, 且最高可编码出 32kb/s的编码码流。
Figure imgf000022_0001
Step S1302: If the current frame is a voice frame, the current frame is encoded according to a voice frame coding algorithm, and a coded stream of up to 32 kb/s can be encoded.
步骤 S1303、 如果当前发生了从语音帧到噪声帧的切换, 还可以 首先进入拖尾阶段。 Step S1303: If the switching from the voice frame to the noise frame occurs currently, First enter the trailing phase.
具体的, 拖尾阶段持续时间为 N帧, 即在发生从语音帧到噪声帧 切换后的 N帧时间内, 仍然按照语音帧的编码算法对当前噪声帧进行 编码, 但是要降低编码速度。 例如, 如果切换前语音帧的编码速率为 进行编码, 如果切换前语音帧的编码速率为 8kb/s或者 12kb/s, 那么进 束。拖尾阶段同时还可以完成对噪声参数的学习训练, 即緩存拖尾阶 段低带信号分量的自相关函数、低带编码参数和高带编码参数等, 用 于对后续噪声帧的编码进行初始化。  Specifically, the trailing phase duration is N frames, that is, in the N frame time after the switching from the voice frame to the noise frame occurs, the current noise frame is still encoded according to the encoding algorithm of the voice frame, but the encoding speed is reduced. For example, if the encoding rate of the speech frame before switching is encoded, if the encoding rate of the speech frame before switching is 8 kb/s or 12 kb/s, then the packet is advanced. At the same time, the learning and training of the noise parameters can be completed at the same time, that is, the autocorrelation function of the low-band signal component of the buffering tail stage, the low-band coding parameter and the high-band coding parameter are used for initializing the encoding of the subsequent noise frame.
拖尾阶段结束后, 对当前帧可以釆用两种传输模式进行编码传 输: 非连续传输(DTX )模式和连续传输模式。 如果釆用非连续传输 模式对当前帧进行编码传输, 则执行步骤 S1304 , 如果釆用连续传输 模式, 则对所有接收到的噪声帧进行编码, 直接执行步骤 S1305〜步 骤 S 1307。  After the end of the smearing phase, two transmission modes can be used for the current frame for coded transmission: discontinuous transmission (DTX) mode and continuous transmission mode. If the current frame is encoded and transmitted in the discontinuous transmission mode, step S1304 is performed. If the continuous transmission mode is used, all the received noise frames are encoded, and steps S1305 to S1307 are directly performed.
步骤 S1304、 判断是否需要对当前噪声帧进行编码, 如果需要对 当前噪声帧进行编码,转步骤 S1305 , 否则不对当前帧进行任何处理。  Step S1304: Determine whether it is necessary to encode the current noise frame. If the current noise frame needs to be encoded, go to step S1305, otherwise no processing is performed on the current frame.
判断是否需要对当前帧进行编码的方法与实施例二中步骤 S504 一致, 不再赘述。  The method for determining whether the current frame needs to be encoded is the same as the step S504 in the second embodiment, and details are not described herein again.
步骤 S1305、 将全频带的噪声信号进行高通滤波和降釆样处理, 得到噪声帧的低带信号分量。  Step S1305: Perform high-pass filtering and down-sample processing on the noise signal of the full-band to obtain a low-band signal component of the noise frame.
具体的,可以釆用实施例二中 QMF滤波的方法得到噪声帧的低带 信号分量,也可以釆用高通滤波和降釆样处理的方法得到噪声帧的低 带信号分量, 本实施例中釆用高通滤波和降釆样处理的方法。  Specifically, the low-band signal component of the noise frame can be obtained by using the QMF filtering method in the second embodiment, or the low-band signal component of the noise frame can be obtained by using the high-pass filtering and the down-sample processing method. In this embodiment, High-pass filtering and down-sampling methods.
可以利用二阶椭圓高通滤波器转移函数对噪声信号 x(n)进行高通 滤波得到滤波后的噪声信号 y(n), 转移函数如公式(7) 所示: High-pass can be performed on the noise signal x(n) by using a second-order elliptical high-pass filter transfer function The filtered noise signal y(n) is obtained by filtering, and the transfer function is as shown in formula (7):
^ 0.95551031152729-1.91102039813878Z"1 -0.9555103152729z"2 (? ) l-1.96646455789804z_1+9.671820760729101z"2 滤波器的输入信号 x(n)与输出信号 y i)之间的关系如式( 8 )所示: '(«) = 1.96646455789804 '(«-1) + 9.671820760729101 '(«-2) + 0.95551031152729χ(«) -1.91102039813878x0-1)- 0.9555103152729x0-2) ^ 0.95551031152729-1.91102039813878Z" 1 -0.9555103152729z" 2 (? ) l-1.96646455789804z _1 +9.671820760729101z" The relationship between the input signal x(n) of the 2 filter and the output signal yi) is as shown in equation (8) : '(«) = 1.96646455789804 '(«-1) + 9.671820760729101 '(«-2) + 0.95551031152729χ(«) -1.91102039813878x0-1)- 0.9555103152729x0-2)
(8) 对高通滤波后的噪声信号 y(n)进行降釆样处理, 得到低带信号分 量 (《), 降釆样处理的方法如公式(9)所示: yi(n) = y(2n) (9) 步骤 S1306、 对噪声帧的低带信号分量进行预加重, 然后进行 CELP编码, 得到噪声帧低带编码参数, 该噪声帧可以只包含窄带核 心层参数, 也可以既包含窄带核心层也包含窄带增强层。 (8) Perform a down-sample processing on the high-pass filtered noise signal y(n) to obtain a low-band signal component ("). The method of the sample-down processing is as shown in equation (9): y i (n) = y (2) (9) Step S1306, pre-emphasizing the low-band signal component of the noise frame, and then performing CELP coding to obtain a low-band coding parameter of the noise frame, where the noise frame may only include a narrow-band core layer parameter, or may include a narrowband The core layer also contains a narrowband enhancement layer.
具体的, 首先对低带信号分量 (《)进行 LPC分析, 进行加窗、 求 取自相关函数和 LPC分析, 得到 LPC系数 O (其中 = 1,2,···,Μ )和残 差能量 Ε, 通常, 会对自相关函数进行适当的平滑处理再进行 LPC分 析, 以得到平滑的 LPC系数 ; 将 LPC系数 转化成谱参数 Q = , = 1,2,---, ] , 其中 M为线性预测的阶数, 然后对谱参数 Ω = [ί¾, = 1,2,···,Μ]进行矢量量化, 得到量化的谱参数 0; 将残差能量 进行对数量化, 得到帧能量 量化的谱参数 及帧能量 作为背 景噪声的窄带核心层参数 [0, Ε]。 则对窄带核心层中谱参数的量化误 差和能量参数的量化误差进一步量化, 即可得到噪声帧的窄带增强 层。 步骤 S1307、 利用得到的噪声帧的低带编码参数重建出低带信号 分量。 Specifically, the LPC analysis is performed on the low-band signal component ("), windowing is performed, the autocorrelation function is obtained, and the LPC analysis is performed to obtain the LPC coefficient O (where = 1, 2, ···, Μ ) and the residual energy. Ε, usually, the autocorrelation function is properly smoothed and then LPC analysis is performed to obtain smooth LPC coefficients; the LPC coefficients are converted into spectral parameters Q = , = 1,2,---, ], where M is The order of the linear prediction, then vector quantize the spectral parameter Ω = [ί3⁄4, = 1, 2, ···, Μ] to obtain the quantized spectral parameter 0; quantize the residual energy to obtain the frame energy quantization The spectral parameters and frame energy are used as narrow-band core layer parameters of background noise [0, Ε]. Then, the quantization error of the spectral parameters in the narrow-band core layer and the quantization error of the energy parameter are further quantized to obtain a narrow-band enhancement layer of the noise frame. Step S1307: reconstructing a low-band signal component by using the low-band coding parameter of the obtained noise frame.
具体的, 利用重建出的谱参数 构造出合成滤波器, 利用高斯随 机噪声作为激励信号, 经过合成滤波器滤波, 滤波器的输出再使用解 码出的能量参数 进行整形, 即可重建出背景噪声的低带信号分量  Specifically, the synthesized filter is constructed by using the reconstructed spectral parameters, and the Gaussian random noise is used as the excitation signal, filtered by the synthesis filter, and the output of the filter is shaped by using the decoded energy parameters to reconstruct the background noise. Low band signal component
步骤 S1308、 对重建出的低带信号分量升釆样为原始釆样率, 并 进行频谱扩展获得重建的全带信号。 Step S1308: Ascending the reconstructed low-band signal component to the original sampling rate, and performing spectrum expansion to obtain the reconstructed full-band signal.
具体的:
Figure imgf000025_0001
步骤 S1309、 将原始的全带信号与重建的全带信号的残差进行 MDCT变换, 对 MDCT系数进行量化编码, 得到噪声帧的高带编码参 数, 重建出噪声帧高带信号分量, 该噪声帧可以只包含宽带核心层, 也可以既包含宽带核心层也包含宽带增强层。
specific:
Figure imgf000025_0001
Step S1309: Perform MDCT transformation on the residual of the original full-band signal and the reconstructed full-band signal, quantize and encode the MDCT coefficients, obtain a high-band coding parameter of the noise frame, and reconstruct a noise frame high-band signal component, and the noise frame It may contain only a broadband core layer, or it may include both a broadband core layer and a broadband enhancement layer.
步骤 S1310、 将低带信号分量、 高带信号分量通过复用器处理, 得到分层结构的背景噪声的编码码流并传输。  Step S1310: The low-band signal component and the high-band signal component are processed by the multiplexer to obtain a coded code stream of the background noise of the hierarchical structure and transmitted.
本发明实施例的编码器***如图 14所示。  The encoder system of the embodiment of the present invention is as shown in FIG.
步骤 S1311、在解码端, 从接收到的编码码流中解码出编码参数, 判断出当前帧的类型, 如果当前帧为语音帧, 则按照语音帧解码算法 对音频信号进行解码, 如果当前帧为噪声帧, 转步骤 S1312。  Step S1311: At the decoding end, decoding the coding parameter from the received coded stream, and determining the type of the current frame. If the current frame is a voice frame, decoding the audio signal according to the voice frame decoding algorithm, if the current frame is The noise frame is changed to step S1312.
在传输的过程中,媒体网关可以根据信道的传输特性, 在需要的 时候丟弃噪声帧的外层编码比特, 而不影响内层比特的解码。 在解码 端, 解码器根据实际接收到的码流进行解码。 步骤 S1312、 重建接收到的噪声帧的编码参数, 并根据噪声帧的 编码参数重建出背景噪声信号。 In the process of transmission, the media gateway can discard the outer coded bits of the noise frame when needed according to the transmission characteristics of the channel without affecting the decoding of the inner layer bits. At the decoding end, the decoder decodes based on the actual received code stream. Step S1312: reconstruct an encoding parameter of the received noise frame, and reconstruct a background noise signal according to the encoding parameter of the noise frame.
具体的,如果接收到到的噪声帧只包含窄带核心层或既包含窄带 核心层又包括窄带增强层, 对接收到的噪声帧进行 CELP解码, 得到 解码出的低带信号分量,将低带信号分量升釆样为全频带信号并进行 频语扩展, 即可得到重建的背景噪声信号。  Specifically, if the received noise frame only includes a narrowband core layer or both a narrowband core layer and a narrowband enhancement layer, CELP decoding is performed on the received noise frame to obtain a decoded lowband signal component, and the lowband signal is obtained. The component is sampled as a full-band signal and frequency-spreaded to obtain a reconstructed background noise signal.
如果接收到的噪声帧还包含宽带核心层或宽带核心层加宽带增 强层, 则将接收到的噪声帧的低带编码参数通过 CELP解码算法解码 出低带信号分量, 将低带信号分量升釆样并进行频语扩展, 得到全频 带信号; 将接收到的噪声帧的高带编码参数 (即 MDCT系数)经过反 量化、 反 MDCT变换, 得到残差信号, 与低带信号分量重建的全频带 信号相加, 即可得到最终重建的全频带背景噪声。  If the received noise frame further includes a broadband core layer or a broadband core layer plus a broadband enhancement layer, the low-band coding parameters of the received noise frame are decoded by the CELP decoding algorithm to the low-band signal component, and the low-band signal component is boosted. And frequency-spreading is performed to obtain a full-band signal; the high-band coding parameters (ie, MDCT coefficients) of the received noise frame are subjected to inverse quantization and inverse MDCT transform to obtain a residual signal, and the full-band reconstructed with the low-band signal component The signals are added together to obtain the final reconstructed full-band background noise.
本实施例的解码器***框图如图 15所示。  The block diagram of the decoder system of this embodiment is as shown in FIG.
通过使用以上实施例提供的方法和装置,编码端根据当前噪声帧 的传输模式选择需要编码的噪声帧进行分层编码,可以对背景噪声帧 进行具有带宽可伸缩性的编码;解码端根据接收到的分层编码的噪声 帧的传输模式解码出噪声帧的编码参数, 进行背景噪声重建, 以实现 对背景噪声进行具有带宽可伸缩性的解码。  By using the method and apparatus provided by the foregoing embodiments, the encoding end selects the noise frame that needs to be encoded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; the decoding end receives according to the The transmission mode of the layered coded noise frame decodes the coding parameters of the noise frame, and performs background noise reconstruction to achieve bandwidth scalability for background noise.
本发明的实施例四提供一种编解码***, 包括:  Embodiment 4 of the present invention provides a codec system, including:
编码器 10, 用于当接收到的音频帧为噪声帧时, 根据当前噪声帧 的传输模式选择需要编码的噪声帧,对需要编码的噪声帧进行分层编 码。  The encoder 10 is configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded.
解码器 20 ,用于当从编码器接收到的音频帧为分层编码的噪声帧 时, 根据当前噪声帧的传输模式解码出噪声帧的编码参数, 根据编码 参数进行背景噪声重建。 本发明的实施例五提供一种编码器, 如图 16所示, 包括: 选择单元 11 , 用于当接收到的音频帧为噪声帧时, 根据当前帧的 传输模式选择需要编码的噪声帧, 并将选择的结果发送给编码单元。 编码单元 12, 用于根据选择单元发送的结果, 对需要进行编码的噪声 帧进行分层编码。 The decoder 20 is configured to: when the audio frame received from the encoder is a layered coded noise frame, decode the coding parameters of the noise frame according to the transmission mode of the current noise frame, and perform background noise reconstruction according to the coding parameter. The fifth embodiment of the present invention provides an encoder, as shown in FIG. 16, including: a selecting unit 11 configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, The result of the selection is sent to the coding unit. The encoding unit 12 is configured to perform layered coding on the noise frame that needs to be encoded according to the result of the sending by the selecting unit.
本编码器还包括: 判断单元 13 , 用于判断当前接收到的音频帧的 类型, 当音频帧为噪声帧且前一帧为语音帧时, 特定帧时间内, 将接 收到的噪声帧发送给语音编码单元, 特定帧时间后, 将接收到的噪声 帧发送给选择单元 11。 语音帧编码单元 14 , 用于接收到判断单元 13 发送的噪声帧后,按照语音编码算法对噪声帧进行编码且降低编码速 率, 并緩存接收到的噪声帧的编码参数。  The encoder further includes: a determining unit 13 configured to determine a type of the currently received audio frame, and when the audio frame is a noise frame and the previous frame is a voice frame, the received noise frame is sent to the specific frame time. The speech coding unit transmits the received noise frame to the selection unit 11 after a specific frame time. The voice frame coding unit 14 is configured to: after receiving the noise frame sent by the determining unit 13, encode the noise frame according to the voice coding algorithm, reduce the coding rate, and buffer the coding parameters of the received noise frame.
编码单元 12进一步包括: 低带编码子单元 121 , 用于对噪声帧的 低带信号分量进行核心层编码。 高带编码子单元 122, 用于对核心层 编码子单元编码的噪声帧的高带信号分量进行扩展层编码。  The coding unit 12 further includes: a low band coding sub-unit 121 for performing core layer coding on the low band signal component of the noise frame. The high-band coding sub-unit 122 is configured to perform enhancement layer coding on the high-band signal component of the noise frame encoded by the core layer coding sub-unit.
本发明的实施例六提供一种解码器如图 17所示, 包括:  Embodiment 6 of the present invention provides a decoder as shown in FIG. 17, which includes:
解码单元 21 , 用于当接收到的音频帧为分层编码的噪声帧时, 根 据当前噪声帧的传输模式解码出噪声帧的编码参数。 重建单元 22, 用 于根据解码单元发送的噪声帧的编码参数, 进行背景噪声重建。  The decoding unit 21 is configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to the transmission mode of the current noise frame. The reconstruction unit 22 is configured to perform background noise reconstruction according to the coding parameters of the noise frame transmitted by the decoding unit.
具体的, 重建单元 22进一步包括: 低带子单元 221 , 用于当接收 到的噪声帧只包含窄带核心层或既包含窄带核心层又包含窄带增强 层时, 利用解码单元输出的低带编码参数, 重建出背景噪声信号的低 带信号分量。 高带子单元 222 , 用于当接收到的噪声帧还包含宽带扩 展层时, 利用解码单元输出的高带编码参数, 重建出背景噪声信号的 高带信号分量。 合成子单元 223 , 用于将低带信号分量和高带信号分 量进行合成滤波, 得到背景噪声信号。 通过使用以上实施例提供的装置,编码端根据当前噪声帧的传输 模式选择需要编码的噪声帧进行分层编码,可以对背景噪声帧进行具 有带宽可伸缩性的编码;解码端根据接收到的分层编码的噪声帧的传 输模式解码出噪声帧的编码参数, 进行背景噪声重建, 以实现对背景 噪声进行具有带宽可伸缩性的解码。 Specifically, the reconstruction unit 22 further includes: a low-band sub-unit 221, configured to use a low-band coding parameter output by the decoding unit when the received noise frame includes only the narrow-band core layer or both the narrow-band core layer and the narrow-band enhancement layer, The low-band signal component of the background noise signal is reconstructed. The high-band sub-unit 222 is configured to reconstruct a high-band signal component of the background noise signal by using a high-band coding parameter output by the decoding unit when the received noise frame further includes the broadband extension layer. The synthesizing subunit 223 is configured to perform synthesis filtering on the low band signal component and the high band signal component to obtain a background noise signal. By using the apparatus provided by the foregoing embodiment, the encoding end selects the noise frame that needs to be coded according to the transmission mode of the current noise frame to perform layered coding, and may perform bandwidth scalability on the background noise frame; the decoding end is based on the received segment. The transmission mode of the layer-coded noise frame decodes the coding parameters of the noise frame, and performs background noise reconstruction to achieve bandwidth-scalable decoding of the background noise.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解 到本发明可以通过硬件实现,也可以可借助软件加必要的通用硬件平 台的方式来实现基于这样的理解,本发明的技术方案可以以软件产品 的形式体现出来, 该软件产品可以存储在一个非易失性存储介质(可 以是 CD-ROM, U盘, 移动硬盘等) 中, 包括若干指令用以使得一 台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行 本发明各个实施例所述的方法。  Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be implemented by hardware, or can be implemented by means of software plus necessary general hardware platform, and the technical solution of the present invention. It can be embodied in the form of a software product that can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including a number of instructions for making a computer device (may It is a personal computer, a server, or a network device, etc.) that performs the methods described in various embodiments of the present invention.
总之, 以上所述仅为本发明的较佳实施例而已, 并非用于限定本 发明的保护范围。 凡在本发明的精神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。  In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

权利要求 Rights request
1、 一种背景噪声的编码方法, 其特征在于, 包括:  A coding method for background noise, comprising:
当接收到的音频帧为噪声帧时,根据当前噪声帧的传输模式选择 需要编码的噪声帧;  When the received audio frame is a noise frame, the noise frame to be encoded is selected according to the transmission mode of the current noise frame;
对所述需要编码的噪声帧进行分层编码。  The noise frame that needs to be encoded is hierarchically coded.
2、 如权利要求 1所述背景噪声的编码方法, 其特征在于, 所述 当接收到的音频帧为噪声帧时,根据当前噪声帧的传输模式选择需要 编码的噪声帧之前还包括:  2. The method of encoding the background noise according to claim 1, wherein when the received audio frame is a noise frame, selecting a noise frame to be encoded according to a transmission mode of the current noise frame further includes:
判断所述当前接收到的音频帧的类型;  Determining the type of the currently received audio frame;
当所述音频帧为噪声帧且前一帧为语音帧时, 则进入拖尾阶段, 即在特定帧时间内, 按照语音帧的编码算法对所述噪声帧进行编码, 且降低编码速率。  When the audio frame is a noise frame and the previous frame is a voice frame, the smear phase is entered, that is, the noise frame is encoded according to the coding algorithm of the voice frame in a specific frame time, and the coding rate is reduced.
3、 如权利要求 2所述背景噪声的编码方法, 其特征在于, 在所 述拖尾阶段, 緩存低带信号分量的自相关函数、低带编码参数和高带 编码参数。  The method of encoding background noise according to claim 2, characterized in that, in said smearing stage, an autocorrelation function, a low band coding parameter and a high band coding parameter of the low band signal component are buffered.
4、 如权利要求 1所述背景噪声的编码方法, 其特征在于, 所述 根据当前噪声帧的传输模式选择需要编码的噪声帧具体为:  The method for encoding a background noise according to claim 1, wherein the selecting a noise frame to be encoded according to a transmission mode of the current noise frame is:
所述当前噪声帧的传输模式为非连续传输时 ,对所述噪声帧进行 判断, 如果判断需要对所述噪声帧进行编码, 则选择所述噪声帧为需 要编码的噪声帧; 或  When the transmission mode of the current noise frame is discontinuous transmission, the noise frame is determined. If it is determined that the noise frame needs to be encoded, the noise frame is selected as a noise frame to be encoded; or
所述当前噪声帧的传输模式为连续传输时 ,则选择当前噪声帧为 需要编码的噪声帧。 When the transmission mode of the current noise frame is continuous transmission, the current noise frame is selected as a noise frame that needs to be encoded.
5、 如权利要求 4所述背景噪声的编码方法, 其特征在于, 所述 当前噪声帧的传输模式为非连续传输时 ,对所述噪声帧进行判断的方 法具体为: The method of encoding the background noise according to claim 4, wherein when the transmission mode of the current noise frame is discontinuous transmission, the method for determining the noise frame is specifically:
计算所述噪声帧的谱、 能量相对于长时平均谱、 能量的失真, 若 所述失真超过特定阔值, 则对该所述噪声帧进行编码, 否则不对当前 帧进行任何处理。  Calculating the spectrum of the noise frame, the energy relative to the long-term average spectrum, and the distortion of the energy. If the distortion exceeds a certain threshold, the noise frame is encoded, otherwise no processing is performed on the current frame.
6、 如权利要求 1所述背景噪声的编码方法, 其特征在于, 所述 对所述需要编码的噪声帧进行分层编码的方法具体包括:  The method for encoding the background noise according to claim 1, wherein the method for hierarchically encoding the noise frame to be encoded includes:
对所述需要编码的噪声帧进行分带滤波,将所述噪声帧分成低带 信号分量和高带信号分量。  The noise frame to be encoded is subjected to band division filtering, and the noise frame is divided into a low band signal component and a high band signal component.
7、 如权利要求 1所述背景噪声的编码方法, 其特征在于, 所述 对所述需要编码的噪声帧进行分层编码的方法具体包括:  The method for encoding the background noise according to claim 1, wherein the method for hierarchically encoding the noise frame to be encoded includes:
将全频带的噪声信号进行高通滤波, 并进行降釆样处理, 得到低 带信号分量。  The noise signal of the whole frequency band is subjected to high-pass filtering, and the sample-down processing is performed to obtain a low-band signal component.
8、 如权利要求 6或 7所述背景噪声的编码方法, 其特征在于, 对所述低带信号分量进行窄带核心层编码。  The method of encoding background noise according to claim 6 or 7, wherein the low-band signal component is subjected to narrow-band core layer coding.
9、 如权利要求 8所述背景噪声的编码方法, 其特征在于, 所述 对所述低带信号分量进行窄带核心层编码的方法具体包括:  The method for encoding the background noise according to claim 8, wherein the method for performing the narrowband core layer coding on the lowband signal component comprises:
对所述噪声帧的低带信号分量进行线性预测分析,得到线性预测 系数和信号能量;  Performing linear prediction analysis on the low-band signal component of the noise frame to obtain a linear prediction coefficient and signal energy;
将所述线性预测系数转化成谱参数, 对所述谱参数进行矢量量 化, 得到量化的谱参数; 将所述信号能量进行对数量化, 得到帧能量; Converting the linear prediction coefficient into a spectral parameter, performing vector quantization on the spectral parameter to obtain a quantized spectral parameter; The signal energy is quantized to obtain a frame energy;
将所述量化的谱参数及所述帧能量作为所述噪声帧的窄带核心 层参数。  The quantized spectral parameters and the frame energy are used as narrowband core layer parameters of the noise frame.
10、 如权利要求 8所述背景噪声的编码方法, 其特征在于, 所述 对需要编码的噪声帧进行分层编码, 进一步包括:  The method of encoding the background noise according to claim 8, wherein the layering and encoding the noise frame that needs to be encoded further includes:
对所述噪声帧进行窄带增强层编码,即对所述窄带核心层中所述 谱参数的量化误差和所述信号能量的量化误差进行量化。  The noise frame is subjected to narrowband enhancement layer coding, i.e., quantization error of the spectral parameters and quantization error of the signal energy in the narrowband core layer.
11、 如权利要求 8所述背景噪声的编码方法, 其特征在于, 所述 对需要编码的噪声帧进行分层编码, 进一步包括:  The method of encoding the background noise according to claim 8, wherein the layering and encoding the noise frame that needs to be encoded further includes:
对所述噪声帧进行宽带扩展层编码。  Broadband extension layer coding is performed on the noise frame.
12、 如权利要求 11所述背景噪声的编码方法, 其特征在于, 所 述对所述噪声帧进行宽带扩展层编码具体包括:  The method of encoding the background noise according to claim 11, wherein the performing the broadband extension layer coding on the noise frame comprises:
获取所述高带信号分量的时域包络和频域包络,将所述频域包络 各维分量减去量化后的时域包络, 得到的矢量***成多个子矢量, 并 分别进行量化, 得到宽带扩展层编码参数。  Obtaining a time domain envelope and a frequency domain envelope of the highband signal component, subtracting the quantized time domain envelope from each dimension component of the frequency domain envelope, and dividing the obtained vector into multiple subvectors, and performing respectively Quantize to obtain the wideband extension layer coding parameters.
13、 如权利要求 11所述背景噪声的编码方法, 其特征在于, 所 述对所述噪声帧进行宽带扩展层编码具体为釆用时域混叠消除 TDAC 编码算法对所述噪声帧的低带残差信号分量及高带信号分量 进行宽带扩展层编码的方法, 具体包括:  The method for encoding a background noise according to claim 11, wherein the performing wideband extension layer coding on the noise frame is specifically using a low-band residual of the noise frame by using a time domain aliasing cancellation TDAC coding algorithm. The method for performing wideband extension layer coding on the difference signal component and the highband signal component includes:
将所述低带信号分量进行重建,将所述重建的低带信号分量升釆 样并进行频语扩展, 得到重建的宽带信号, 将原始的宽带信号与所述 重建的宽带信号的残差进行修正的离散余弦变换 MDCT变换, 对得 到的 MDCT系数进行量化编码, 得到宽带扩展层编码参数。 Reconstructing the low-band signal component, ascending and reconstructing the reconstructed low-band signal component to obtain a reconstructed wideband signal, and performing residual of the original wideband signal and the reconstructed wideband signal Modified discrete cosine transform MDCT transform, right The obtained MDCT coefficients are quantized and encoded to obtain a wideband extension layer coding parameter.
14、 一种背景噪声的解码方法, 其特征在于, 包括:  14. A method for decoding background noise, comprising:
当接收到的音频帧为分层编码的噪声帧时,根据当前噪声帧的传 输模式解码出所述噪声帧的编码参数;  When the received audio frame is a layered coded noise frame, the coding parameters of the noise frame are decoded according to a transmission mode of the current noise frame;
根据所述编码参数进行背景噪声重建。  Background noise reconstruction is performed according to the coding parameters.
15、 如权利要求 14所述背景噪声的解码方法, 其特征在于, 所 述根据当前噪声帧的传输模式解码出所述噪声帧的编码参数的方法 具体为:  The method for decoding the background noise according to claim 14, wherein the method for decoding the coding parameter of the noise frame according to the transmission mode of the current noise frame is specifically:
所述当前噪声帧的传输模式为非连续传输时,解码出接收到的噪 声帧的编码参数, 对于未传输的噪声帧, 则根据以前接收到的噪声帧 或在拖尾阶段緩存的编码参数解码出当前噪声帧的编码参数; 或  When the transmission mode of the current noise frame is discontinuous transmission, the coding parameters of the received noise frame are decoded, and for the untransmitted noise frame, the coding parameters are buffered according to the previously received noise frame or the smearing stage. The encoding parameter of the current noise frame; or
所述当前噪声帧的传输模式为连续传输时,则对所述接收到的噪 声帧解码出编码参数。  When the transmission mode of the current noise frame is continuous transmission, the coding parameters are decoded for the received noise frame.
16、 如权利要求 14所述背景噪声的解码方法, 其特征在于, 所 述根据所述编码参数进行背景噪声重建的方法具体为:  The method for decoding background noise according to claim 14, wherein the method for performing background noise reconstruction according to the encoding parameter is specifically:
所述接收到的噪声帧只包含窄带核心层或既包括窄带核心层又 包括窄带增强层时, 使用重建出的谱参数计算出合成滤波器的系数, 使用高斯随机噪声作为激励, 通过计算出的合成滤波器进行合成滤 波, 并使用重建出的能量参数进行时域整形, 重建出背景噪声信号。  When the received noise frame only includes a narrowband core layer or both a narrowband core layer and a narrowband enhancement layer, the coefficients of the synthesis filter are calculated using the reconstructed spectral parameters, and Gaussian random noise is used as the excitation, and the calculated The synthesis filter performs synthesis filtering, and uses the reconstructed energy parameters for time domain shaping to reconstruct the background noise signal.
17、 如权利要求 14所述背景噪声的解码方法, 其特征在于, 所 述根据所述编码参数进行背景噪声重建的方法具体为:  The method for decoding background noise according to claim 14, wherein the method for performing background noise reconstruction according to the encoding parameter is specifically:
所述接收到的噪声帧只包含窄带核心层或既包括窄带核心层又 包括窄带增强层时, 对低带编码参数进行码激励线性预测 CELP 解 码, 得到解码出的低带信号分量, 将低带信号分量升釆样为全频带信 号并进行频语扩展, 重建出背景噪声信号。 The received noise frame only includes a narrowband core layer or both a narrowband core layer and Including the narrowband enhancement layer, code-excited linear prediction CELP decoding is performed on the low-band coding parameters, and the decoded low-band signal component is obtained, and the low-band signal component is sampled into a full-band signal and frequency-spreaded to reconstruct the background noise. signal.
18、 如权利要求 14所述背景噪声的解码方法, 其特征在于, 所 述根据所述编码参数进行背景噪声重建, 进一步包括:  The background noise decoding method according to claim 14, wherein the performing background noise reconstruction according to the encoding parameter further comprises:
所述接收到的噪声帧还包含宽带扩展层时,  When the received noise frame further includes a broadband extension layer,
釆用时域带宽扩展 TDBWE解码算法对所述噪声帧重建出背景 噪声信号; 或  Using the time domain bandwidth extension TDBWE decoding algorithm reconstructs a background noise signal for the noise frame; or
釆用 TDAC解码算法对所述噪声帧重建出背景噪声信号。  The background noise signal is reconstructed from the noise frame using a TDAC decoding algorithm.
19、 如权利要求 18所述背景噪声的解码方法, 其特征在于, 所 述釆用 TDBWE解码算法对所述噪声帧重建出背景噪声信号的方法 具体为:  The method for decoding background noise according to claim 18, wherein the method for reconstructing the background noise signal from the noise frame by using the TDBWE decoding algorithm is specifically:
使用重建出的谱参数计算出合成滤波器的系数,使用高斯随机噪 声作为激励, 通过计算出的合成滤波器进行合成滤波, 并使用重建出 的能量参数进行时域整形, 得到背景噪声信号的低带信号分量; 使用高斯随机噪声作为激励源,利用重建出的高带编码参数对所 述激励源进行时域整形和频域整形,重建出背景噪声信号的高带信号 分量;  Calculate the coefficients of the synthesis filter using the reconstructed spectral parameters, use Gaussian random noise as the excitation, perform synthesis filtering through the calculated synthesis filter, and perform time domain shaping using the reconstructed energy parameters to obtain a low background noise signal. With a signal component; using Gaussian random noise as an excitation source, using the reconstructed high-band coding parameters to perform time domain shaping and frequency domain shaping on the excitation source to reconstruct a high-band signal component of the background noise signal;
将所述重建出的低带信号分量和高带信号分量进行合成滤波,得 到背景噪声信号。  The reconstructed low-band signal component and the high-band signal component are combined and filtered to obtain a background noise signal.
20、 如权利要求 18所述背景噪声的解码方法, 其特征在于, 所 述釆用 TDAC解码算法对所述噪声帧建出背景噪声信号的方法具体 为: The method for decoding background noise according to claim 18, wherein the method for constructing a background noise signal by using the TDAC decoding algorithm on the noise frame is specific For:
对低带编码参数通过 CELP解码算法解码出低带信号分量,将低 带信号分量升釆样并进行频谱扩展, 得到全频带信号;  The low-band coding parameter is decoded by the CELP decoding algorithm to demodulate the low-band signal component, and the low-band signal component is sampled and spectrum-expanded to obtain a full-band signal;
对重建的高带编码参数进行反量化和反 MDCT变换, 得到残差 信号, 与所述全频带信号进行合并, 得到宽带的背景噪声信号。  The reconstructed high-band coding parameters are inverse quantized and inverse MDCT-transformed to obtain a residual signal, which is combined with the full-band signal to obtain a broadband background noise signal.
21、 一种编码器, 其特征在于, 包括:  21. An encoder, comprising:
选择单元, 用于当接收到的音频帧为噪声帧时, 根据当前帧的传 输模式选择需要编码的噪声帧, 并将选择的结果发送给编码单元; 编码单元, 用于根据所述选择单元发送的结果, 对需要进行编码 的噪声帧进行分层编码。  a selecting unit, configured to: when the received audio frame is a noise frame, select a noise frame to be encoded according to a transmission mode of the current frame, and send the selected result to the coding unit; and the coding unit is configured to send according to the selection unit As a result, the noise frames that need to be encoded are hierarchically encoded.
22、 如权利要求 21所述编码器, 其特征在于, 还包括: 判断单元, 用于判断当前接收到的音频帧的类型, 当所述音频帧 为噪声帧且前一帧为语音帧时, 特定帧时间内, 将接收到的噪声帧发 送给语音编码单元, 所述特定帧时间后, 将接收到的噪声帧发送给所 述选择单元;  The encoder according to claim 21, further comprising: a determining unit, configured to determine a type of the currently received audio frame, when the audio frame is a noise frame and the previous frame is a voice frame, Sending the received noise frame to the speech coding unit, and transmitting the received noise frame to the selection unit after the specific frame time;
语音帧编码单元, 用于接收到所述判断单元发送的噪声帧后, 按 照语音编码算法对所述噪声帧进行编码且降低编码速率。  The voice frame coding unit is configured to: after receiving the noise frame sent by the determining unit, encode the noise frame according to a voice coding algorithm and reduce the coding rate.
23、 如权利要求 21所述编码器, 其特征在于, 所述编码单元进 一步包括:  The encoder according to claim 21, wherein the encoding unit further comprises:
低带编码子单元, 用于对噪声帧的低带信号分量进行核心层编 码;  a low-band coding sub-unit for performing core layer coding on a low-band signal component of a noise frame;
高带编码子单元,用于对所述核心层编码子单元编码的噪声帧的 高带信号分量进行扩展层编码。 a high-band coding sub-unit for encoding a noise frame of the core layer coding sub-unit The highband signal component is subjected to spreading layer coding.
24、 一种解码器, 其特征在于, 包括:  24. A decoder, comprising:
解码单元, 用于当接收到的音频帧为分层编码的噪声帧时, 根据 当前噪声帧的传输模式解码出所述噪声帧的编码参数;  a decoding unit, configured to: when the received audio frame is a layered coded noise frame, decode the coding parameter of the noise frame according to a transmission mode of the current noise frame;
重建单元, 用于根据所述解码单元发送的所述噪声帧的编码参 数, 进行背景噪声重建。  And a reconstruction unit, configured to perform background noise reconstruction according to the coding parameter of the noise frame sent by the decoding unit.
25、 如权利要求 24所述解码器, 其特征在于, 所述重建单元进 一步包括:  The decoder according to claim 24, wherein the reconstruction unit further comprises:
低带子单元,用于当接收到的噪声帧只包含窄带核心层或既包含 窄带核心层又包含窄带增强层时, 利用解码单元输出的低带编码参 数, 重建出背景噪声信号的低带信号分量;  a low-band sub-unit for reconstructing a low-band signal component of the background noise signal by using a low-band coding parameter output by the decoding unit when the received noise frame includes only the narrow-band core layer or both the narrow-band core layer and the narrow-band enhancement layer ;
高带子单元, 用于当接收到的噪声帧还包含宽带扩展层时, 利用 解码单元输出的高带编码参数, 重建出背景噪声信号的高带信号分 量;  a high-band sub-unit, configured to reconstruct a high-band signal component of the background noise signal by using a high-band coding parameter output by the decoding unit when the received noise frame further includes a wideband extension layer;
合成子单元,用于将所述低带信号分量和高带信号分量进行合成 滤波, 得到背景噪声信号。  And a synthesis subunit, configured to perform synthesis filtering on the low band signal component and the high band signal component to obtain a background noise signal.
26、 一种编解码***, 其特征在于, 包括:  26. A codec system, comprising:
编码器, 用于当接收到的音频帧为噪声帧时, 根据当前噪声帧的 传输模式选择需要编码的噪声帧,对所述需要编码的噪声帧进行分层 编码;  An encoder, configured to: when the received audio frame is a noise frame, select a noise frame that needs to be coded according to a transmission mode of the current noise frame, and perform hierarchical coding on the noise frame that needs to be coded;
解码器,用于当从所述编码器接收到的音频帧为分层编码的噪声 帧时, 根据当前噪声帧的传输模式解码出所述噪声帧的编码参数, 根 据所述编码参数进行背景噪声重建。 a decoder, configured to: when the audio frame received from the encoder is a layered coded noise frame, decode the coding parameter of the noise frame according to a transmission mode of the current noise frame, Background noise reconstruction is performed according to the coding parameters.
PCT/CN2008/072939 2007-11-07 2008-11-04 An encoding/decoding method and a device for the background noise WO2009067883A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710169832.6 2007-11-07
CN 200710169832 CN101430880A (en) 2007-11-07 2007-11-07 Encoding/decoding method and apparatus for ambient noise

Publications (1)

Publication Number Publication Date
WO2009067883A1 true WO2009067883A1 (en) 2009-06-04

Family

ID=40646234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/072939 WO2009067883A1 (en) 2007-11-07 2008-11-04 An encoding/decoding method and a device for the background noise

Country Status (2)

Country Link
CN (1) CN101430880A (en)
WO (1) WO2009067883A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10115406B2 (en) 2013-06-10 2018-10-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V Apparatus and method for audio signal envelope encoding, processing, and decoding by splitting the audio signal envelope employing distribution quantization and coding
US11776551B2 (en) 2013-06-21 2023-10-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
CN117672247A (en) * 2024-01-31 2024-03-08 中国电子科技集团公司第十五研究所 Method and system for filtering narrowband noise through real-time audio

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101826331B1 (en) 2010-09-15 2018-03-22 삼성전자주식회사 Apparatus and method for encoding and decoding for high frequency bandwidth extension
MY186055A (en) * 2010-12-29 2021-06-17 Samsung Electronics Co Ltd Coding apparatus and decoding apparatus with bandwidth extension
EP2709101B1 (en) * 2012-09-13 2015-03-18 Nxp B.V. Digital audio processing system and method
PL3550562T3 (en) * 2013-02-22 2021-05-31 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatuses for dtx hangover in audio coding
CN106169297B (en) 2013-05-30 2019-04-19 华为技术有限公司 Coding method and equipment
KR101789083B1 (en) 2013-06-10 2017-10-23 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에.베. Apparatus and method for audio signal envelope encoding, processing and decoding by modelling a cumulative sum representation employing distribution quantization and coding
EP2980790A1 (en) 2014-07-28 2016-02-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for comfort noise generation mode selection
EP3079151A1 (en) * 2015-04-09 2016-10-12 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder and method for encoding an audio signal
CN112863539B (en) * 2019-11-28 2024-04-16 科大讯飞股份有限公司 High-sampling-rate voice waveform generation method, device, equipment and storage medium
CN113066487A (en) * 2019-12-16 2021-07-02 广东小天才科技有限公司 Learning method, system, equipment and storage medium for correcting accent
CN114006874B (en) * 2020-07-14 2023-11-10 ***通信集团吉林有限公司 Resource block scheduling method, device, storage medium and base station
CN112420065B (en) * 2020-11-05 2024-01-05 北京中科思创云智能科技有限公司 Audio noise reduction processing method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867815A (en) * 1994-09-29 1999-02-02 Yamaha Corporation Method and device for controlling the levels of voiced speech, unvoiced speech, and noise for transmission and reproduction
JPH11352999A (en) * 1998-04-06 1999-12-24 Ricoh Co Ltd Voice compression coding device
CN1428953A (en) * 2002-04-22 2003-07-09 西安大唐电信有限公司 Implement method of multi-channel AMR vocoder and its equipment
CN1922660A (en) * 2004-02-24 2007-02-28 松下电器产业株式会社 Communication device, signal encoding/decoding method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867815A (en) * 1994-09-29 1999-02-02 Yamaha Corporation Method and device for controlling the levels of voiced speech, unvoiced speech, and noise for transmission and reproduction
JPH11352999A (en) * 1998-04-06 1999-12-24 Ricoh Co Ltd Voice compression coding device
CN1428953A (en) * 2002-04-22 2003-07-09 西安大唐电信有限公司 Implement method of multi-channel AMR vocoder and its equipment
CN1922660A (en) * 2004-02-24 2007-02-28 松下电器产业株式会社 Communication device, signal encoding/decoding method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Acoustics, Speech and Signal Processing, IEEE International Conference on, 15 Apr.2007", 15 April 2007, article RAGOT,S. ET AL.: "ITU-T G.729.1: AN 8-32 Kbit/S Scalable Coder Interoperable with G.729 for Wideband Telephony and Voice Over IP." *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10115406B2 (en) 2013-06-10 2018-10-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V Apparatus and method for audio signal envelope encoding, processing, and decoding by splitting the audio signal envelope employing distribution quantization and coding
US11776551B2 (en) 2013-06-21 2023-10-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out in different domains during error concealment
US11869514B2 (en) 2013-06-21 2024-01-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved signal fade out for switched audio coding systems during error concealment
CN117672247A (en) * 2024-01-31 2024-03-08 中国电子科技集团公司第十五研究所 Method and system for filtering narrowband noise through real-time audio
CN117672247B (en) * 2024-01-31 2024-04-02 中国电子科技集团公司第十五研究所 Method and system for filtering narrowband noise through real-time audio

Also Published As

Publication number Publication date
CN101430880A (en) 2009-05-13

Similar Documents

Publication Publication Date Title
WO2009067883A1 (en) An encoding/decoding method and a device for the background noise
AU2018217299B2 (en) Improving classification between time-domain coding and frequency domain coding
CN1957398B (en) Methods and devices for low-frequency emphasis during audio compression based on acelp/tcx
US8532983B2 (en) Adaptive frequency prediction for encoding or decoding an audio signal
KR101854297B1 (en) Audio decoder and method for providing a decoded audio information using an error concealment based on a time domain excitation signal
US8473301B2 (en) Method and apparatus for audio decoding
CN108831501B (en) High frequency encoding/decoding method and apparatus for bandwidth extension
US8718804B2 (en) System and method for correcting for lost data in a digital audio signal
JP6039678B2 (en) Audio signal encoding method and decoding method and apparatus using the same
WO2009117967A1 (en) Coding and decoding methods and devices
WO2009109139A1 (en) A super-wideband extending coding and decoding method, coder and super-wideband extending system
US9047877B2 (en) Method and device for an silence insertion descriptor frame decision based upon variations in sub-band characteristic information
EP3039676A1 (en) Adaptive bandwidth extension and apparatus for the same
WO2010028301A1 (en) Spectrum harmonic/noise sharpness control
KR101801758B1 (en) Audio classification based on perceptual quality for low or medium bit rates
WO2010000179A1 (en) A frequency band expanding method, system and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08855069

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08855069

Country of ref document: EP

Kind code of ref document: A1