WO2006070757A1 - 音声符号化装置および音声符号化方法 - Google Patents
音声符号化装置および音声符号化方法 Download PDFInfo
- Publication number
- WO2006070757A1 WO2006070757A1 PCT/JP2005/023809 JP2005023809W WO2006070757A1 WO 2006070757 A1 WO2006070757 A1 WO 2006070757A1 JP 2005023809 W JP2005023809 W JP 2005023809W WO 2006070757 A1 WO2006070757 A1 WO 2006070757A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- channel
- monaural
- prediction
- channel signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 22
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims abstract description 30
- 230000015572 biosynthetic process Effects 0.000 claims description 20
- 238000003786 synthesis reaction Methods 0.000 claims description 20
- 238000004891 communication Methods 0.000 claims description 16
- 238000012935 Averaging Methods 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 230000005236 sound signal Effects 0.000 abstract description 68
- 238000004458 analytical method Methods 0.000 abstract description 23
- 238000004364 calculation method Methods 0.000 abstract description 11
- 108091006146 Channels Proteins 0.000 description 178
- 238000013139 quantization Methods 0.000 description 38
- 239000010410 layer Substances 0.000 description 33
- 239000012792 core layer Substances 0.000 description 12
- 238000010586 diagram Methods 0.000 description 12
- 230000005284 excitation Effects 0.000 description 6
- 230000007423 decrease Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
Definitions
- the present invention relates to a speech coding apparatus and speech coding method, and more particularly to a speech coding apparatus and speech coding method that generate and encode a monaural signal from a stereo speech input signal.
- a voice coding scheme having a scalable configuration is desired in order to control traffic on the network and realize multicast communication.
- a scalable configuration refers to a configuration in which audio data can be decoded even from partial encoded data on the receiving side.
- a monaural signal is generated from a stereo input signal.
- a method for generating a monaural signal for example, there is a method of obtaining a monaural signal by averaging signals of both channels of a stereo signal (hereinafter abbreviated as “( ⁇ ” as appropriate) (see Non-Patent Document 1).
- Non-Patent Document 1 ISO / IEC 14496-3, "Information Technology-Coding of audio-visual objects-Part 3: Audio, subpart-4, 4.B.14 Scalable AAC with core coder, pp.304-305, Sep . 2000.
- the waveform may be a monaural signal with a waveform that is significantly different from the input stereo signal.
- a signal that is deteriorated from an input signal that should be transmitted or a signal that is different from the input signal that should be transmitted may be transmitted.
- a monaural signal in which the input stereo signal is distorted or a monaural signal whose waveform shape is significantly different from that of the input stereo signal is a coding model suitable for the characteristics unique to the audio signal such as CELP code.
- An object of the present invention is to provide a speech encoding apparatus and speech encoding method that can generate a suitable monaural signal from a stereo signal and suppress a decrease in encoding efficiency of the monaural signal.
- the speech coding apparatus uses a stereo signal including a first channel signal and a second channel signal as an input signal, and a time difference between the first channel signal and the second channel signal, and A first generating means for generating a monaural signal from the first channel signal and the second channel signal based on an amplitude ratio between the first channel signal and the second channel signal; and a code for encoding the monaural signal
- the present invention adopts a configuration comprising:
- a monaural signal is generated by generating an appropriate monaural signal with a stereo signal power. It is possible to suppress a decrease in the sign efficiency.
- FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a block diagram showing a configuration of a monaural signal generation unit according to Embodiment 1 of the present invention.
- FIG. 3 is a signal waveform diagram according to Embodiment 1 of the present invention.
- FIG. 4 is a block diagram showing a configuration of a monaural signal generation unit according to Embodiment 1 of the present invention.
- FIG. 5 is a block diagram showing a configuration of a speech coding apparatus according to Embodiment 2 of the present invention.
- FIG. 6 is a block diagram showing the configuration of the lch and 2ch prediction signal synthesizer according to Embodiment 2 of the present invention.
- FIG. 8 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 2 of the present invention.
- FIG. 9 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 3 of the present invention.
- FIG. 10 is a block diagram showing a configuration of a monaural signal generation unit according to Embodiment 4 of the present invention.
- FIG. 11 is a block diagram showing a configuration of a speech coding apparatus according to Embodiment 5 of the present invention.
- FIG. 12 is a block diagram showing a configuration of a speech decoding apparatus according to Embodiment 5 of the present invention.
- a speech encoding apparatus 10 shown in FIG. 1 includes a monaural signal generation unit 101 and a monaural signal encoding unit 102.
- the monaural signal generation unit 101 generates a monaural signal from a stereo input audio signal (the lch audio signal, the 2ch audio signal) and outputs the monaural signal to the monaural signal encoding unit 102. Details of the monaural signal generation unit 101 will be described later.
- the monaural signal encoding unit 102 encodes the monaural signal, and outputs monaural signal encoded data that is audio encoded data for the monaural signal.
- the monaural signal encoding unit 102 can encode a monaural signal using an arbitrary encoding method.
- a coding scheme based on CELP coding suitable for efficient coding of speech signals can be used.
- You can also use other audio encoding methods and audio encoding methods such as AAC (Advanced Audio Coding).
- the monaural signal generation unit 101 includes an inter-channel prediction analysis unit 201, an intermediate prediction parameter generation unit 202, and a monaural signal calculation unit 203.
- the inter-channel prediction analysis unit 201 obtains a prediction parameter between both channels from the lch audio signal and the 2ch audio signal by analysis.
- This prediction parameter is a parameter that enables mutual prediction between channel signals using the correlation between the 1st channel audio signal and the 2nd channel audio signal, and is based on the delay difference and amplitude ratio between both channels. Is a parameter.
- the lch audio signal S P_chl predicted from the 2ch audio signal S _ch2 (n) (n) , and, the 2ch audio signal sp_ C h2 of second lch audio signal s_chl (n) forces are also predicted ( n) is expressed by Equations (1) and (2).
- the ratio of the average amplitude of the positions) g and g are the prediction parameters.
- sp_chl (n) the prediction signal of the lch
- g the lch input signal for the 2ch input signal
- Dist2 ⁇ ⁇ s_ch2 (n)-sp_ch2 (n) ⁇ 2 ... (4)
- the inter-channel prediction analysis unit 201 does not obtain the prediction parameters so as to minimize the distortion Distl and Dist2, but instead of the delay time difference or the frame unit that maximizes the cross-correlation between the channel signals.
- the average amplitude ratio between channel signals can also be obtained as a prediction parameter.
- the intermediate prediction parameter generation unit 202 uses the prediction parameters D 1, D 2, g 3, and g in order to make the finally generated monaural signal an intermediate signal between the lch audio signal and the 2nd audio signal.
- intermediate prediction parameters D, D, g, g
- G Intermediate prediction parameters (delay time difference, amplitude ratio) based on the 2nd channel.
- the intermediate prediction parameter may be obtained from only the amplitude ratio g.
- the amplitude ratios g 1 and g 2 may be fixed values
- D, D, g, and g are averaged over time.
- a method for calculating the intermediate prediction parameter a method other than the above may be used as long as the delay time difference between the lch and the 2nd ch and the value near the middle of the amplitude ratio are calculated. You can
- the monaural signal calculation unit 203 calculates the monaural signal s_mon 0 (n) using Equation (13) using the intermediate prediction parameter obtained by the intermediate prediction parameter generation unit 202.
- s_mono (n) ⁇ g lu ⁇ s— chl (n-D ln ) + g 2m ⁇ s_ch2 (n-D 2ni ) ⁇ / 2
- the monaural signal may be calculated from only the input audio signals of one channel.
- FIG. 3 shows an example of the waveform 31 of the lch audio signal and the waveform 32 of the 2ch audio signal input to the monaural signal generation unit 101.
- the monaural signal generated by the monaural signal generation unit 101 from the lch audio signal and the 2ch audio signal is as shown in the waveform 33 in the figure.
- the waveform 34 is a monaural signal (conventional) generated by simply averaging the lch audio signal and the 2ch audio signal.
- the waveform of the monaural signal obtained by the monaural signal generation unit 101 33 has a waveform similar to both the lch audio signal and the 2ch audio signal, and having an intermediate delay time and amplitude.
- the monaural signal (waveform 34) generated by the conventional method is less similar in waveform to the lch audio signal and the 2ch audio signal than the waveform 33. This is because the monaural signal (waveform 33) generated so that the delay time difference and amplitude ratio between both channels is an intermediate value between both channels is the spatial 2 of the audio signal output from both channels.
- a signal that is more appropriate as a monaural signal i.e., the input signal, compared to a monaural signal that is generated without considering the spatial characteristics (waveform 34). This is because it is a signal with little distortion similar to.
- a monaural signal (waveform 34) generated by simply averaging the signals of both channels is Because the signal is generated by simple average calculation without considering the delay time difference or amplitude ratio between the channel signals, if the delay time difference between the signals of both channels is large, the audio signal of both channels The signals are superimposed while being shifted in time, and the input audio signal is distorted or the waveform is greatly different. As a result, when the monaural signal is encoded with an encoding model that matches the characteristics of the audio signal, such as CELP encoding, the encoding efficiency decreases.
- the monaural signal (waveform 33) obtained by the monaural signal generation unit 101 is a signal adjusted so as to reduce the delay time difference between the audio signals of both channels. It becomes a small distortion signal similar to. Therefore, it is possible to suppress a decrease in encoding efficiency when the monaural signal code is input.
- the monaural signal generation unit 101 may be configured as follows.
- the prediction parameter another parameter may be used in addition to the delay time difference and the amplitude ratio.
- the lch audio signal and the 2ch audio signal are band-divided into two or more frequency bands to generate input signals for each band, and for all or a part of the band signals, A monaural signal may be generated for each band in the same manner as described above.
- the intermediate prediction parameter obtained by the intermediate prediction parameter generation unit 202 is transmitted together with the encoded data, and the intermediate prediction parameter is used in the subsequent code to calculate the necessary operations.
- the monaural signal generation unit 101 quantizes the intermediate prediction parameter and outputs the quantized intermediate prediction parameter and the intermediate prediction parameter quantized code. 204 may be provided.
- FIG. 5 shows the configuration of the speech encoding apparatus according to the present embodiment.
- Speech coding apparatus 500 shown in FIG. 5 includes a core layer coding unit 510 for a monaural signal and an enhancement layer coding unit 520 for a stereo signal.
- Core layer encoding section 510 includes speech encoding apparatus 10 (FIG. 1: monaural signal generation section 101 and monaural signal encoding section 102) according to Embodiment 1.
- monaural signal generation section 101 In core layer encoding section 510, monaural signal generation section 101 generates monaural signal s_mono (n) as described in Embodiment 1, and outputs it to monaural signal encoding section 102.
- the monaural signal encoding unit 102 encodes the monaural signal, and outputs the encoding signal data of the monaural signal to the monaural signal decoding unit 511. Also, the encoded data of the monaural signal is multiplexed with the quantized code or encoded data output from enhancement layer encoding section 520 and transmitted to the speech decoding apparatus as encoded data.
- the monaural signal decoding unit 511 generates a monaural decoded signal from the monaural signal coding data and outputs the monaural decoding signal to the enhancement layer coding unit 520.
- lch prediction parameter analysis section 521 obtains and quantizes the lch prediction parameter from lch audio signal s_chl (n) and the monaural decoded signal, and performs lch prediction quantization.
- the parameter is output to the l-th channel prediction signal synthesis unit 522.
- the l-th channel prediction parameter analysis unit 521 outputs the l-th channel prediction parameter quantization code obtained by encoding the l-th channel prediction quantization parameter. This lch prediction parameter quantized code is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- First lch prediction signal combining section 522 combines the monaural decoded signal, the first ch prediction quantization parameter, and the first lch prediction signal, and outputs the first lch prediction signal to subtractor 523. Details of the l c h prediction signal combining unit 522 will be described later.
- the subtractor 523 is the difference between the lch speech signal as the input signal and the lch prediction signal, that is, the signal of the residual component of the lch prediction signal relative to the lch input speech signal (the lch prediction). Residual signal) is obtained and output to the l-th prediction residual signal encoding unit 524.
- the lch prediction residual signal code key unit 524 encodes the lch prediction residual signal and outputs lch prediction residual encoded data. This lch prediction residual encoded data is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- the second channel prediction parameter analysis unit 525 obtains and quantizes the second channel prediction parameter from the second channel speech signal s_ch2 (n) and the monaural decoded signal, and synthesizes the second channel prediction quantization parameter to the second channel prediction signal synthesis. Output to part 526. Second channel prediction parameter analysis section 525 outputs a second channel prediction parameter quantization code obtained by encoding the second channel prediction quantization parameter. This second channel prediction parameter quantized code is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- Second channel predicted signal synthesis section 526 synthesizes the second channel predicted signal from the monaural decoded signal and the second channel predicted quantization parameter, and outputs the second channel predicted signal to subtractor 527. Details of the 2ch predicted signal synthesis unit 526 will be described later.
- the subtractor 527 is a difference between the second channel speech signal as the input signal and the second channel prediction signal, that is, a signal of the residual component of the second channel prediction signal with respect to the second channel input speech signal (second channel prediction residual). Difference signal) is output to the second channel prediction residual signal encoding unit 528.
- Second channel prediction residual signal encoding unit 528 encodes the second channel prediction residual signal and outputs second channel prediction residual encoded data.
- This second channel prediction residual encoded data is multiplexed with other encoded data and quantized code and transmitted to the speech decoding apparatus as encoded data.
- the configurations of the l-th channel prediction signal synthesis unit 522 and the 2nd channel prediction signal synthesis unit 526 are as shown in FIG. 6 ⁇ Configuration example 1> or FIG. 7 ⁇ Configuration example 2>.
- the delay difference (D samples) and amplitude ratio (g) of each channel signal relative to the monaural signal are used as the predictive quantization parameters.
- the prediction signal of each channel is synthesized from the monaural signal.
- Configuration Example 1> In configuration example 1, as shown in FIG. 6, the lch predicted signal synthesis unit 522 and the 2nd channel predicted signal synthesis unit 526 include a delay unit 531 and a multiplier 532, and perform the prediction represented by Equation (16). Thus, the prediction signal sp_ch (n) of each channel is synthesized from the monaural decoded signal sd_mono (n).
- sp_ch ⁇ n) g ⁇ sd_mono (n-D) ⁇ ⁇ ⁇ (1 6)
- the configuration shown in FIG. 6 is further provided with delay units 533-1 to P, multipliers 534-1 to P, and an adder 535.
- the prediction coefficient sequence ⁇ a (0), a (l), a (2), , A (P) ⁇ (P is the prediction order, a (0) 1.0)
- the prediction signal of each channel is obtained from the monaural decoded signal sd_mono (n) by the prediction expressed by Equation (17).
- sp sinthesizes ch (n).
- the lch prediction parameter analysis unit 521 and the 2nd channel prediction parameter analysis unit 525 are prediction parameters that minimize distortion Distl and Dist2 expressed by the equations (3) and (4).
- the prediction quantization parameter obtained by quantizing the prediction parameter is output to the lch prediction signal synthesis unit 522 and the second channel prediction signal synthesis unit 526 having the above configuration.
- the lch prediction parameter analysis unit 521 and the 2nd channel prediction parameter analysis unit 525 output a prediction parameter quantization code obtained by encoding the prediction quantization parameter.
- the 1st channel prediction parameter analysis unit 521 and the 2nd channel prediction parameter analysis unit 525 maximize the cross-correlation between the monaural decoded signal and the input speech signal of each channel.
- the delay difference D and the average amplitude ratio g in units of frames may be obtained as prediction parameters.
- Speech decoding apparatus 600 shown in FIG. 8 includes core layer decoding section 610 for monaural signals and enhancement layer decoding section 620 for stereo signals. With.
- the monaural signal decoding unit 611 decodes the encoded data of the input monaural signal, outputs the monaural decoded signal to the enhancement layer decoding unit 620, and outputs it as the final output.
- the lch prediction parameter decoding unit 621 decodes the input lch prediction parameter quantization code and outputs the lch prediction quantization parameter to the lch prediction signal synthesis unit 622.
- the lch predicted signal synthesizer 622 has the same configuration as that of the lch predicted signal synthesizer 522 of the speech coding apparatus 500, and the lch audio signal is derived from the monaural decoded signal and the lch predicted quantization parameter. And the l-th channel predicted speech signal is output to the adder 624.
- the lch prediction residual signal decoding unit 623 decodes the input lch prediction residual code data and outputs the lch prediction residual signal to the adder 624.
- Adder 624 adds the l-th channel predicted speech signal and the l-th channel prediction residual signal to obtain the l-th channel decoded signal, and outputs it as the final output.
- second channel prediction parameter decoding section 625 decodes the input second channel prediction parameter quantization code, and outputs the second channel prediction quantization parameter to second channel prediction signal synthesis section 626.
- the second channel predicted signal synthesizer 626 has the same configuration as the second channel predicted signal synthesizer 526 of the speech coding apparatus 500, and outputs the second channel audio signal from the monaural decoded signal and the second channel predicted quantization parameter. Predict and output the second channel predicted speech signal to the adder 628.
- Second channel prediction residual signal decoding section 627 decodes the input second channel prediction residual code data and outputs the second channel prediction residual signal to adder 628.
- Adder 628 adds the second channel predicted speech signal and the second channel predicted residual signal to obtain a second channel decoded signal, and outputs it as the final output.
- audio decoding apparatus 600 having such a configuration, in a monaural-stereo 'scalable configuration, when the output audio is monaural, a decoded signal obtained only from the code signal data of the monaural signal is monaurally decoded. When output as a signal and the output sound is stereo, all received encoded data and quantized code are used to receive the 1st channel decoding signal. 2 and the 2nd channel decoded signal are decoded and output.
- CELP coding may be used for coding of the core layer and coding of the enhancement layer.
- the LPC prediction residual signal of the signal of each channel is predicted using the monaural coded drive sound source signal obtained by CELP coding.
- the excitation signal is coded in the frequency domain instead of performing the driving excitation search in the time domain.
- prediction of each channel signal or each channel signal is performed using the intermediate prediction parameter obtained by the monaural signal generation unit 101 and the monaural decoded signal or the monaural driving excitation signal obtained by CELP coding of the monaural signal.
- the LPC prediction residual signal may be predicted.
- encoding using prediction from a monaural signal as described above may be performed on only one channel signal of stereo input signals.
- the speech decoding apparatus generates a decoded signal of the other channel from the decoded monaural signal and one channel signal based on the relationship between the stereo input signal and the monaural signal (Equation (12), etc.). The power to do S.
- the speech coding apparatus uses the delay time difference and the amplitude ratio between the monaural signal and the signal of each channel as the prediction parameters, and uses the quantization of the second channel prediction parameter as the first channel prediction parameter.
- FIG. 9 shows the configuration of speech coding apparatus 700 according to the present embodiment.
- the same components as those in Embodiment 2 (FIG. 5) are denoted by the same reference numerals, and the description thereof is omitted.
- the second channel prediction parameter analysis unit 701 performs the first channel prediction parameter quantization based on the relationship (dependency) between the first channel prediction parameter and the second channel prediction parameter.
- the second channel prediction parameter is estimated from the first channel prediction quantization parameter obtained by the first channel prediction parameter analysis unit 521, and efficient quantization is performed using the second channel prediction parameter. More specifically, the following is performed.
- the lch prediction quantization parameter (delay time difference, amplitude ratio) obtained by the lch prediction parameter analysis unit 521 is Dql and gql
- the second channel prediction parameter (before quantization) obtained by the analysis is Let D2, g2. Since the monaural signal is a signal generated as an intermediate signal between the 1st channel audio signal and the 2nd channel audio signal as described above, the relevance between the 1st channel prediction parameter and the 2nd channel prediction parameter is large. Therefore, the second ch prediction parameters Dp2 and gp2 are estimated by equations (18) and (19) using the lch prediction quantization parameter.
- 6D2 D2-Dp2 (2 0)
- Equations (18) and (19) are examples, and the second channel prediction is performed using another method using the relationship (dependency relationship) between the first channel prediction parameter and the second channel prediction parameter.
- Parameter estimation and quantization may be performed.
- a codebook may be prepared by combining the lch prediction parameter and the 2ch prediction parameter as a set, and quantization may be performed by vector quantization.
- analysis and quantization of the lch prediction parameter and the 2ch prediction parameter may be performed using the intermediate prediction parameter obtained by the configuration of FIG. 2 or FIG. In this case, since the 1st ch prediction parameter and the 2nd ch prediction parameter can be estimated in advance, the amount of computation required for the analysis can be reduced.
- the configuration of the speech decoding apparatus according to the present embodiment is substantially the same as that of Embodiment 2 (Fig. 8). However, the 2nd ch prediction parameter decoding unit 625 The difference is that a decoding process corresponding to the configuration of speech coding apparatus 700 is performed, such as decoding using the l-th prediction quantization parameter at the time of decoding.
- FIG. 10 shows the configuration of monaural signal generation unit 101 according to the present embodiment.
- the same components as those in Embodiment 1 (FIG. 2) are denoted by the same reference numerals, and description thereof is omitted.
- Correlation determining section 801 calculates the degree of correlation between the 1st channel audio signal and the 2nd channel audio signal, and determines whether or not the level of correlation is greater than a threshold value. Correlation determining section 801 controls switching sections 802 and 804 based on the determination result. The calculation of the degree of correlation and the threshold determination are performed, for example, by obtaining the maximum value (normalized value) of the cross-correlation function between signals of each channel and comparing it with a predetermined threshold.
- Correlation determining section 801 switches switching section 802 so that the lch audio signal and the 2ch audio signal are input to inter-channel prediction analysis section 201 and monaural signal calculation section 203 when the degree of correlation is greater than the threshold. And switching the switching unit 804 to the monaural signal calculation unit 203 side. As a result, when the correlation between the 1st channel and the 2nd channel is larger than the threshold value, a monaural signal is generated as described in the first embodiment.
- correlation determination unit 801 switches switching unit 802 so that the lch audio signal and the 2nd channel audio signal are input to average value signal calculation unit 803, and the switching The unit 804 is switched to the average value signal calculation unit 803 side. Therefore, in this case, the average value signal calculation unit 803 calculates the average value signal s_av (n) of the lch audio signal and the 2nd audio signal according to the equation (22), and outputs it as a monaural signal. .
- the signal of the average value of the lch audio signal and the 2ch audio signal Monaural Therefore, sound quality degradation can be prevented when the correlation between the 1st channel audio signal and the 2nd channel audio signal is small.
- encoding is performed in an appropriate encoding mode based on the correlation between the two channels, it is possible to improve code efficiency.
- the monaural signal generated by switching the generation method based on the correlation between the lch and the second ch as described above corresponds to the correlation between the lch and the second ch.
- Scalable encoding may be performed. If the correlation between channel 1 and channel 2 is greater than the threshold, the structure shown in Embodiment 2 or 3 is used to encode the monaural signal in the core layer and use the monaural decoded signal in the enhancement layer. Encoding is performed using the signal prediction of each channel. On the other hand, if the correlation between the 1st channel and the 2nd channel is below the threshold value, it is suitable for the case where the correlation between the two channels is low in the enhancement layer after the code signal is applied to the monaural signal in the core layer.
- Encode with another scalable configuration Coding with another scalable configuration suitable for low correlation is, for example, a method that directly encodes the differential signal between the signal of each channel and the monaural decoded signal without using inter-channel prediction.
- the monaural driving sound source signal is directly used in the enhancement layer code key without using inter-channel prediction. There are methods such as encoding.
- the enhancement layer coding unit performs coding coding only for the lch, and uses the quantized intermediate prediction parameter in the coding for the lch. Performs prediction signal synthesis.
- FIG. 11 shows the configuration of speech encoding apparatus 900 according to the present embodiment. In FIG. 11, the same components as those of the second embodiment (FIG. 5) are denoted by the same reference numerals, and the description thereof is omitted.
- monaural signal generation unit 101 employs the configuration shown in FIG. That is, the monaural signal generation unit 101 includes an intermediate prediction parameter quantization unit 204.
- the intermediate prediction parameter quantization unit 204 quantizes the intermediate prediction parameter to quantize the intermediate prediction parameter and the intermediate prediction parameter quantization code. Is output.
- the quantized intermediate prediction parameter is obtained by quantizing D, D, g, and g. Quantized intermediate prediction
- the parameters are input to the l-th channel prediction signal combining unit 901 of the enhancement layer code key unit 520.
- Ma The intermediate prediction parameter quantization code is multiplexed with the monaural signal encoded data and the l-th channel prediction residual encoded data, and transmitted to the speech decoding apparatus as encoded data.
- lch prediction signal combining section 901 combines the lch prediction signal from the monaural decoded signal and the quantized intermediate prediction parameter, and outputs the lch prediction signal to subtractor 523. To do. Specifically, the l-th ch predicted signal synthesis unit 901 synthesizes the l-th ch predicted signal sp_chl (n) from the monaural decoded signal sd_mono (n) by the prediction represented by Expression (23).
- FIG. 12 shows the configuration of speech decoding apparatus 1000 according to the present embodiment.
- the same components as those in the second embodiment are denoted by the same reference numerals, and the description thereof is omitted.
- intermediate prediction parameter decoding section 1001 decodes the input intermediate prediction parameter quantized code, and the quantized intermediate prediction parameters are converted into lch prediction signal synthesis section 1002 and second channel decoding. Output to signal generator 1003.
- First lch predicted signal synthesizer 1002 predicts the monaural decoded signal, the quantized intermediate prediction parameter, and the first lch speech signal, and outputs the lch predicted speech signal to adder 624. Specifically, the l-ch predicted signal synthesizer 1002 performs the monaural decoded signal sdjno no (n) by the prediction expressed by the above equation (23), similarly to the l-ch predicted signal synthesizer 901 of the speech coding apparatus 900. ) To synthesize the prediction signal sp_chl (n) of the l-th channel.
- second channel decoded signal generation section 1003 generates a second channel decoded signal from the quantized intermediate prediction parameter, the monaural decoded signal and the first channel decoded signal.
- second channel decoded signal generation section 1003 generates the second channel decoded signal according to equation (24) obtained from the relationship of equation (13) above.
- sd_chl l-th channel decoded signal.
- enhancement layer code key section 520 synthesizes the prediction signal of only the lch, but as the configuration of synthesis of the prediction signal of only the 2nd channel instead of the lch. Also good. That is, in this embodiment, enhancement layer coding section 520 employs a configuration in which only one channel of a stereo signal is coded.
- enhancement layer coding section 520 is configured to encode only one channel of a stereo signal, and synthesizes the prediction signal of that one channel. Since the prediction parameter to be used is shared with the intermediate prediction meter for monaural signal generation, the coding efficiency can be improved. In addition, since the enhancement layer encoding unit 520 is configured to encode only one channel of the stereo signal, the encoding efficiency of the enhancement layer encoding unit is improved compared to the configuration of encoding both channels. By doing so, a low bit rate can be achieved.
- the encoded data is transmitted to the speech decoding apparatus 1000, and the parameter D, g force equation (27) to (
- a plurality of candidates for intermediate prediction parameters are prepared, and among the plurality of candidates, coding distortion after coding in enhancement layer coding section 520 (distortion only in enhancement layer coding section 520, or The intermediate prediction parameter that minimizes the sum of the distortion of the core layer coding unit 510 and the distortion of the enhancement layer coding unit 520) may be used for the coding in the enhancement layer coding unit 520. Good. As a result, it is possible to select an optimum parameter that can improve the prediction performance when the prediction signal is synthesized in the enhancement layer, and to further improve the sound quality.
- the specific procedure is as follows.
- the monaural signal generation unit 101 outputs a plurality of candidate intermediate prediction parameters, and outputs a monaural signal generated corresponding to each candidate. For example, a predetermined number of intermediate prediction parameters are output as a plurality of candidates in order for a power having a small prediction distortion or having a large cross-correlation between signals of each channel.
- the monaural signal encoding unit 102 performs monaural signal encoding using the monaural signal generated corresponding to the plurality of intermediate prediction parameter candidates, and outputs monaural signal encoded data and data for each of the plurality of candidates.
- Output coding distortion (monaural signal coding distortion).
- Step 3 lch coding>
- enhancement layer coding section 520 a plurality of l-th channel prediction signals are combined using a plurality of candidate intermediate prediction parameters to perform l-channel coding, and encoded data (l-th channel prediction residual code) is encoded for each of the plurality of candidates.
- encoded data l-th channel prediction residual code
- Output and encoding distortion (stereo encoding distortion).
- enhancement layer coding section 520 among the plurality of candidate intermediate prediction parameters, the sum of the coding distortion obtained in step 2 and step 3 (or the sum of the coding distortion obtained in step 2 or step The intermediate prediction parameter with the smallest power of the total coding distortion obtained in step 3 is determined as the parameter to be used for encoding, and the monaural signal encoded data and intermediate prediction parameter quantization code corresponding to the intermediate prediction parameter are determined. And the l-th channel prediction residual code data is transmitted to the speech decoding apparatus 1000.
- the intermediate prediction parameter is not transmitted (selection information (1 as the selection flag for the normal monauralization mode) Bit) (only bit) is transmitted).
- Selection information (1 as the selection flag for the normal monauralization mode) Bit
- Encoding may be performed by the tension layer code key unit 520. In this way, it is possible to achieve optimal encoding based on the codeh distortion minimization criterion including the normal monaural mode as a candidate, and it is not necessary to transmit intermediate prediction parameters when the normal monaural mode is selected. The sound quality can be improved by assigning bits to other code data.
- CELP coding may be used for coding of the core layer and coding of the enhancement layer.
- the enhancement layer performs prediction of the LPC prediction residual signal of the signal of each channel using the monaural coded driving excitation signal obtained by the CELP code.
- the excitation signal is coded in the frequency domain instead of performing the driving excitation search in the time domain.
- the speech encoding apparatus and speech decoding apparatus are mounted on a wireless communication apparatus such as a wireless communication mobile station apparatus or a wireless communication base station apparatus used in a mobile communication system. Is also possible.
- Each functional block used in the description of each of the above embodiments is typically realized as an LSI which is an integrated circuit. These may be individually arranged on one chip, or may be integrated into one chip so as to include a part or all of them.
- IC integrated circuit
- system LSI system LSI
- super LSI super LSI
- unilera LSI depending on the difference in power integration as LSI.
- the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible.
- FPGA Field Programmable Gate Array
- the present invention can be applied to the use of a communication device in a mobile communication system or a packet communication system using the Internet protocol.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/722,821 US7797162B2 (en) | 2004-12-28 | 2005-12-26 | Audio encoding device and audio encoding method |
DE602005017660T DE602005017660D1 (de) | 2004-12-28 | 2005-12-26 | Audiokodierungsvorrichtung und audiokodierungsmethode |
EP05819447A EP1821287B1 (en) | 2004-12-28 | 2005-12-26 | Audio encoding device and audio encoding method |
AT05819447T ATE448539T1 (de) | 2004-12-28 | 2005-12-26 | Audiokodierungsvorrichtung und audiokodierungsmethode |
JP2006550770A JP5046653B2 (ja) | 2004-12-28 | 2005-12-26 | 音声符号化装置および音声符号化方法 |
CN2005800450680A CN101091206B (zh) | 2004-12-28 | 2005-12-26 | 语音编码装置和语音编码方法 |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004380980 | 2004-12-28 | ||
JP2004-380980 | 2004-12-28 | ||
JP2005157808 | 2005-05-30 | ||
JP2005-157808 | 2005-05-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006070757A1 true WO2006070757A1 (ja) | 2006-07-06 |
Family
ID=36614874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/023809 WO2006070757A1 (ja) | 2004-12-28 | 2005-12-26 | 音声符号化装置および音声符号化方法 |
Country Status (8)
Country | Link |
---|---|
US (1) | US7797162B2 (ja) |
EP (2) | EP2138999A1 (ja) |
JP (1) | JP5046653B2 (ja) |
KR (1) | KR20070090219A (ja) |
CN (1) | CN101091206B (ja) |
AT (1) | ATE448539T1 (ja) |
DE (1) | DE602005017660D1 (ja) |
WO (1) | WO2006070757A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008016097A1 (fr) * | 2006-08-04 | 2008-02-07 | Panasonic Corporation | dispositif de codage audio stéréo, dispositif de décodage audio stéréo et procédé de ceux-ci |
WO2008090970A1 (ja) * | 2007-01-26 | 2008-07-31 | Panasonic Corporation | ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 |
WO2009142017A1 (ja) * | 2008-05-22 | 2009-11-26 | パナソニック株式会社 | ステレオ信号変換装置、ステレオ信号逆変換装置およびこれらの方法 |
WO2010016270A1 (ja) * | 2008-08-08 | 2010-02-11 | パナソニック株式会社 | 量子化装置、符号化装置、量子化方法及び符号化方法 |
JP2010541007A (ja) * | 2007-09-25 | 2010-12-24 | モトローラ・インコーポレイテッド | マルチ・チャンネル音響信号をエンコードするための装置および方法 |
WO2014068817A1 (ja) * | 2012-10-31 | 2014-05-08 | パナソニック株式会社 | オーディオ信号符号化装置及びオーディオ信号復号装置 |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4887288B2 (ja) * | 2005-03-25 | 2012-02-29 | パナソニック株式会社 | 音声符号化装置および音声符号化方法 |
BRPI0616624A2 (pt) | 2005-09-30 | 2011-06-28 | Matsushita Electric Ind Co Ltd | aparelho de codificação de fala e método de codificação de fala |
US7991611B2 (en) * | 2005-10-14 | 2011-08-02 | Panasonic Corporation | Speech encoding apparatus and speech encoding method that encode speech signals in a scalable manner, and speech decoding apparatus and speech decoding method that decode scalable encoded signals |
WO2007052612A1 (ja) * | 2005-10-31 | 2007-05-10 | Matsushita Electric Industrial Co., Ltd. | ステレオ符号化装置およびステレオ信号予測方法 |
US20090276210A1 (en) * | 2006-03-31 | 2009-11-05 | Panasonic Corporation | Stereo audio encoding apparatus, stereo audio decoding apparatus, and method thereof |
WO2008007700A1 (fr) | 2006-07-12 | 2008-01-17 | Panasonic Corporation | Dispositif de décodage de son, dispositif de codage de son, et procédé de compensation de trame perdue |
US20100010811A1 (en) * | 2006-08-04 | 2010-01-14 | Panasonic Corporation | Stereo audio encoding device, stereo audio decoding device, and method thereof |
KR101453732B1 (ko) * | 2007-04-16 | 2014-10-24 | 삼성전자주식회사 | 스테레오 신호 및 멀티 채널 신호 부호화 및 복호화 방법및 장치 |
WO2008132850A1 (ja) * | 2007-04-25 | 2008-11-06 | Panasonic Corporation | ステレオ音声符号化装置、ステレオ音声復号装置、およびこれらの方法 |
US8473288B2 (en) * | 2008-06-19 | 2013-06-25 | Panasonic Corporation | Quantizer, encoder, and the methods thereof |
EP2313886B1 (en) * | 2008-08-11 | 2019-02-27 | Nokia Technologies Oy | Multichannel audio coder and decoder |
CN102292769B (zh) * | 2009-02-13 | 2012-12-19 | 华为技术有限公司 | 一种立体声编码方法和装置 |
US9053701B2 (en) | 2009-02-26 | 2015-06-09 | Panasonic Intellectual Property Corporation Of America | Channel signal generation device, acoustic signal encoding device, acoustic signal decoding device, acoustic signal encoding method, and acoustic signal decoding method |
US8666752B2 (en) * | 2009-03-18 | 2014-03-04 | Samsung Electronics Co., Ltd. | Apparatus and method for encoding and decoding multi-channel signal |
CN102157150B (zh) | 2010-02-12 | 2012-08-08 | 华为技术有限公司 | 立体声解码方法及装置 |
CN102157152B (zh) | 2010-02-12 | 2014-04-30 | 华为技术有限公司 | 立体声编码的方法、装置 |
CN109215667B (zh) | 2017-06-29 | 2020-12-22 | 华为技术有限公司 | 时延估计方法及装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04324727A (ja) * | 1991-04-24 | 1992-11-13 | Fujitsu Ltd | ステレオ符号化伝送方式 |
JP2004325633A (ja) * | 2003-04-23 | 2004-11-18 | Matsushita Electric Ind Co Ltd | 信号符号化方法、信号符号化プログラム及びその記録媒体 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19721487A1 (de) * | 1997-05-23 | 1998-11-26 | Thomson Brandt Gmbh | Verfahren und Vorrichtung zur Fehlerverschleierung bei Mehrkanaltonsignalen |
DE19742655C2 (de) * | 1997-09-26 | 1999-08-05 | Fraunhofer Ges Forschung | Verfahren und Vorrichtung zum Codieren eines zeitdiskreten Stereosignals |
SE519981C2 (sv) * | 2000-09-15 | 2003-05-06 | Ericsson Telefon Ab L M | Kodning och avkodning av signaler från flera kanaler |
US7292901B2 (en) * | 2002-06-24 | 2007-11-06 | Agere Systems Inc. | Hybrid multi-channel/cue coding/decoding of audio signals |
SE0202159D0 (sv) * | 2001-07-10 | 2002-07-09 | Coding Technologies Sweden Ab | Efficientand scalable parametric stereo coding for low bitrate applications |
ES2323294T3 (es) * | 2002-04-22 | 2009-07-10 | Koninklijke Philips Electronics N.V. | Dispositivo de decodificacion con una unidad de decorrelacion. |
CN1647156B (zh) * | 2002-04-22 | 2010-05-26 | 皇家飞利浦电子股份有限公司 | 参数编码方法、参数编码器、用于提供音频信号的设备、解码方法、解码器、用于提供解码后的多声道音频信号的设备 |
DE602004002390T2 (de) * | 2003-02-11 | 2007-09-06 | Koninklijke Philips Electronics N.V. | Audiocodierung |
EP1606797B1 (en) | 2003-03-17 | 2010-11-03 | Koninklijke Philips Electronics N.V. | Processing of multi-channel signals |
JP4324727B2 (ja) | 2003-06-20 | 2009-09-02 | カシオ計算機株式会社 | 撮影モードの設定情報転送システム |
JP2005157808A (ja) | 2003-11-26 | 2005-06-16 | Star Micronics Co Ltd | カード保管装置 |
-
2005
- 2005-12-26 WO PCT/JP2005/023809 patent/WO2006070757A1/ja active Application Filing
- 2005-12-26 EP EP09173155A patent/EP2138999A1/en not_active Withdrawn
- 2005-12-26 DE DE602005017660T patent/DE602005017660D1/de active Active
- 2005-12-26 US US11/722,821 patent/US7797162B2/en active Active
- 2005-12-26 KR KR1020077014866A patent/KR20070090219A/ko not_active Application Discontinuation
- 2005-12-26 EP EP05819447A patent/EP1821287B1/en not_active Not-in-force
- 2005-12-26 AT AT05819447T patent/ATE448539T1/de not_active IP Right Cessation
- 2005-12-26 JP JP2006550770A patent/JP5046653B2/ja not_active Expired - Fee Related
- 2005-12-26 CN CN2005800450680A patent/CN101091206B/zh not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04324727A (ja) * | 1991-04-24 | 1992-11-13 | Fujitsu Ltd | ステレオ符号化伝送方式 |
JP2004325633A (ja) * | 2003-04-23 | 2004-11-18 | Matsushita Electric Ind Co Ltd | 信号符号化方法、信号符号化プログラム及びその記録媒体 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008016097A1 (fr) * | 2006-08-04 | 2008-02-07 | Panasonic Corporation | dispositif de codage audio stéréo, dispositif de décodage audio stéréo et procédé de ceux-ci |
JP4999846B2 (ja) * | 2006-08-04 | 2012-08-15 | パナソニック株式会社 | ステレオ音声符号化装置、ステレオ音声復号装置、およびこれらの方法 |
WO2008090970A1 (ja) * | 2007-01-26 | 2008-07-31 | Panasonic Corporation | ステレオ符号化装置、ステレオ復号装置、およびこれらの方法 |
JP2010541007A (ja) * | 2007-09-25 | 2010-12-24 | モトローラ・インコーポレイテッド | マルチ・チャンネル音響信号をエンコードするための装置および方法 |
WO2009142017A1 (ja) * | 2008-05-22 | 2009-11-26 | パナソニック株式会社 | ステレオ信号変換装置、ステレオ信号逆変換装置およびこれらの方法 |
WO2010016270A1 (ja) * | 2008-08-08 | 2010-02-11 | パナソニック株式会社 | 量子化装置、符号化装置、量子化方法及び符号化方法 |
WO2014068817A1 (ja) * | 2012-10-31 | 2014-05-08 | パナソニック株式会社 | オーディオ信号符号化装置及びオーディオ信号復号装置 |
JPWO2014068817A1 (ja) * | 2012-10-31 | 2016-09-08 | 株式会社ソシオネクスト | オーディオ信号符号化装置及びオーディオ信号復号装置 |
Also Published As
Publication number | Publication date |
---|---|
DE602005017660D1 (de) | 2009-12-24 |
ATE448539T1 (de) | 2009-11-15 |
CN101091206B (zh) | 2011-06-01 |
US7797162B2 (en) | 2010-09-14 |
EP1821287A4 (en) | 2008-03-12 |
JP5046653B2 (ja) | 2012-10-10 |
EP1821287B1 (en) | 2009-11-11 |
KR20070090219A (ko) | 2007-09-05 |
US20080091419A1 (en) | 2008-04-17 |
EP2138999A1 (en) | 2009-12-30 |
JPWO2006070757A1 (ja) | 2008-06-12 |
CN101091206A (zh) | 2007-12-19 |
EP1821287A1 (en) | 2007-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006070757A1 (ja) | 音声符号化装置および音声符号化方法 | |
US11978460B2 (en) | Truncateable predictive coding | |
US7945447B2 (en) | Sound coding device and sound coding method | |
JP4963965B2 (ja) | スケーラブル符号化装置、スケーラブル復号装置、及びこれらの方法 | |
JP4850827B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP5413839B2 (ja) | 符号化装置および復号装置 | |
JP4555299B2 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
WO2006118179A1 (ja) | 音声符号化装置および音声符号化方法 | |
WO2012066727A1 (ja) | ステレオ信号符号化装置、ステレオ信号復号装置、ステレオ信号符号化方法及びステレオ信号復号方法 | |
JP4887279B2 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
JP2013137563A (ja) | ストリーム合成装置、復号装置、ストリーム合成方法、復号方法、およびコンピュータプログラム | |
WO2006104017A1 (ja) | 音声符号化装置および音声符号化方法 | |
JPWO2006070760A1 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
WO2006129615A1 (ja) | スケーラブル符号化装置およびスケーラブル符号化方法 | |
WO2009122757A1 (ja) | ステレオ信号変換装置、ステレオ信号逆変換装置およびこれらの方法 | |
JP2006072269A (ja) | 音声符号化装置、通信端末装置、基地局装置および音声符号化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006550770 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11722821 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580045068.0 Country of ref document: CN Ref document number: 2005819447 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020077014866 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005819447 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11722821 Country of ref document: US |