WO2007037361A1 - 音声符号化装置および音声符号化方法 - Google Patents
音声符号化装置および音声符号化方法 Download PDFInfo
- Publication number
- WO2007037361A1 WO2007037361A1 PCT/JP2006/319438 JP2006319438W WO2007037361A1 WO 2007037361 A1 WO2007037361 A1 WO 2007037361A1 JP 2006319438 W JP2006319438 W JP 2006319438W WO 2007037361 A1 WO2007037361 A1 WO 2007037361A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- layer
- unit
- section
- encoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 48
- 238000001228 spectrum Methods 0.000 claims abstract description 409
- 230000005236 sound signal Effects 0.000 claims abstract description 45
- 239000013598 vector Substances 0.000 claims description 42
- 230000003595 spectral effect Effects 0.000 claims description 24
- 238000012937 correction Methods 0.000 claims description 20
- 238000004891 communication Methods 0.000 claims description 10
- 238000013459 approach Methods 0.000 claims description 3
- 238000013139 quantization Methods 0.000 abstract description 14
- 238000006243 chemical reaction Methods 0.000 abstract description 10
- 230000015556 catabolic process Effects 0.000 abstract description 5
- 238000006731 degradation reaction Methods 0.000 abstract description 5
- 238000004364 calculation method Methods 0.000 description 55
- 238000010586 diagram Methods 0.000 description 35
- 230000008569 process Effects 0.000 description 27
- 238000001914 filtration Methods 0.000 description 25
- 238000012986 modification Methods 0.000 description 22
- 230000004048 modification Effects 0.000 description 22
- 238000012545 processing Methods 0.000 description 22
- 230000015572 biosynthetic process Effects 0.000 description 18
- 238000003786 synthesis reaction Methods 0.000 description 18
- 238000005070 sampling Methods 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 13
- 230000001629 suppression Effects 0.000 description 11
- 102100037651 AP-2 complex subunit sigma Human genes 0.000 description 10
- 101000806914 Homo sapiens AP-2 complex subunit sigma Proteins 0.000 description 10
- 238000000926 separation method Methods 0.000 description 10
- 230000009466 transformation Effects 0.000 description 7
- 230000006866 deterioration Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 239000006185 dispersion Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000005284 excitation Effects 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 230000007423 decrease Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 239000002131 composite material Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
Definitions
- the present invention relates to a speech encoding apparatus and speech encoding method.
- Non-Patent Document 1 As a conventional scalable coding, there is one using a technique standardized by MPEG-4 (Moving Picture Experts Group stage-4) (for example, see Non-Patent Document 1).
- CELP Code Excited Linear Prediction
- AAC Advanced Audio Coder
- TwmVQ Transform Domain Weighted interleave Vector Quantization
- the frequency band of an audio signal is divided into two subbands, a low band and a high band, and the low band spectrum is copied to the high band, and the copied spectrum is transformed.
- the spectrum of the high-frequency part can be used.
- a low bit rate error can be achieved by encoding the deformation information with a small number of bits.
- Non-patent document 1 edited by Satoshi Miki, all of MPEG-4, first edition, Industrial Research Co., Ltd., September 30, 1998, pp.126-127
- Patent Document 1 Special Table 2001-521648
- the spectrum of an audio signal or audio signal is represented by the product of a component (spectrum envelope) that changes gently with frequency and a component (spectral fine structure) that changes finely.
- Fig. 1 shows the spectrum of an audio signal
- Fig. 2 shows the spectrum envelope
- Fig. 3 shows the spectral fine structure.
- This spectral envelope (Fig. 2) is calculated using a 10th-order LPC (Linear Prediction Coding) coefficient. From these figures, it can be seen that the spectrum of the product speech signal (Fig. 1) of the spectral envelope (Fig. 2) and the vector fine structure (Fig. 3) is obtained.
- the bandwidth of the high-frequency part that is the duplication destination is wider than the bandwidth of the low-frequency part that is the duplication source
- the low-frequency vector is duplicated more than once in the high-frequency domain.
- the low-frequency spectrum is replicated to the high-frequency region multiple times in this way, as shown in Fig. 4, discontinuity of spectral energy occurs at the connection destination of the target spectrum.
- the cause of this discontinuity is the spectral envelope. As shown in Fig. 2, in the spectrum envelope, the frequency is increased and the energy is attenuated, so that the spectrum is inclined. Due to the presence of such a spectrum inclination, when the low-frequency spectrum is duplicated multiple times in the high-frequency area, discontinuity of the spectrum energy occurs, and the speech quality deteriorates.
- This discontinuity The continuation can be corrected by gain adjustment, but a large number of bits are required to obtain a sufficient effect by gain adjustment.
- An object of the present invention is to maintain a continuity of spectrum energy and prevent speech quality degradation even when a low-frequency spectrum is duplicated multiple times in a high-frequency region. ⁇ To provide a device and a speech coding method.
- the speech coding apparatus uses first coding means for coding a low-frequency spectrum of a speech signal, and a low-band spectral signal using an LPC coefficient of the speech signal.
- FIG. 5A Explanatory diagram of the operating principle of the present invention (decoded spectrum in the low band)
- FIG. 5B is an explanatory diagram of the operating principle of the present invention (spectrum after passing through an inverse filter).
- FIG. 5C Explanatory diagram of the operating principle of the present invention (encoding of the high frequency band)
- FIG. 5D is an explanatory diagram of the operating principle of the present invention (spectrum of decoded signal).
- FIG. 6 is a block configuration diagram of a speech coding apparatus according to Embodiment 1 of the present invention.
- FIG. 7 is a block diagram of the second layer code key unit of the above voice code key device.
- FIG. 8 is an operation explanatory diagram of the filtering unit according to Embodiment 1 of the present invention.
- FIG. 9 is a block configuration diagram of the speech decoding apparatus according to Embodiment 1 of the present invention.
- FIG. 10 is a block diagram of the second layer decoding unit of the speech decoding apparatus.
- FIG. 11 is a block configuration diagram of a speech coding apparatus according to Embodiment 2 of the present invention.
- FIG. 12 is a block configuration diagram of a speech decoding apparatus according to Embodiment 2 of the present invention.
- FIG. 13 is a block configuration diagram of a speech coding apparatus according to Embodiment 3 of the present invention.
- FIG. 14 is a block configuration diagram of a speech decoding apparatus according to Embodiment 3 of the present invention.
- FIG. 15 is a block configuration diagram of a speech coding apparatus according to Embodiment 4 of the present invention.
- FIG. 16 is a block configuration diagram of a speech decoding apparatus according to Embodiment 4 of the present invention.
- FIG. 17 is a block diagram of a speech coding apparatus according to Embodiment 5 of the present invention.
- FIG. 18 is a block diagram of a speech decoding apparatus according to Embodiment 5 of the present invention.
- FIG. 19 is a block diagram of a speech coding apparatus according to Embodiment 5 of the present invention (Modification 1).
- FIG. 20 is a block configuration diagram of a speech coding apparatus according to Embodiment 5 of the present invention (Modification 2).
- FIG. 21 is a block configuration diagram of a speech decoding apparatus according to Embodiment 5 of the present invention (Modification 1).
- FIG. 22 is a block configuration diagram of a second layer code key section according to Embodiment 6 of the present invention.
- FIG. 23 is a block configuration diagram of a spectrum deforming unit according to the sixth embodiment of the present invention.
- FIG. 24 is a block configuration diagram of a second layer decoding unit according to Embodiment 6 of the present invention.
- FIG. 25 is a block configuration diagram of a spectrum modification unit according to the seventh embodiment of the present invention.
- FIG. 26 is a block configuration diagram of a spectrum deforming unit according to the eighth embodiment of the present invention.
- FIG. 27 is a block configuration diagram of a spectrum deforming unit according to the ninth embodiment of the present invention.
- FIG. 28 is a block configuration diagram of a second layer code key section according to Embodiment 10 of the present invention.
- FIG. 29 is a block configuration diagram of a second layer decoding unit according to Embodiment 10 of the present invention.
- FIG. 30 is a block configuration diagram of a second layer code key section according to Embodiment 11 of the present invention.
- FIG. 31 is a block configuration diagram of a second layer decoding key section according to Embodiment 11 of the present invention.
- FIG. 32 is a block configuration diagram of a second layer code key section according to Embodiment 12 of the present invention.
- FIG. 33 is a block configuration diagram of a second layer decoding unit according to Embodiment 12 of the present invention.
- FL is a threshold frequency
- 0 ⁇ FL is a low castle portion
- FL ⁇ FH is a high frequency portion.
- FIG. 5A shows a decoded spectrum of a low band part obtained by a conventional coding Z decoding process
- FIG. 5B shows that the decoded spectrum shown in FIG. 5A is converted into an inverse filter having characteristics opposite to the spectrum envelope.
- the spectrum obtained by passing is shown.
- the low-band spectrum is flattened by passing the low-band decoded spectrum through an inverse filter having characteristics opposite to the spectrum envelope.
- FIG. 5C the flattened low-frequency part spectrum is duplicated in the high-frequency part a plurality of times (here, twice), and the high-frequency part is encoded.
- FIG. 5B the low-frequency spectrum has already been flattened.
- a spectrum of the decoded signal as shown in FIG. 5D is obtained by applying a spectrum envelope to the spectrum extended to the signal band cover FH.
- the low frequency spectrum is used as the internal state of the pitch filter, and the pitch filter processing is performed on the frequency axis with low, high to high frequency.
- the method of performing and estimating the high-frequency part of a spectrum can be used. According to this encoding method, it is only necessary to code the filter information of the pitch filter in the high band code, so a low bit rate error can be achieved.
- the present embodiment a case will be described in which coding in the frequency domain is performed in both the first layer and the second layer. Further, in the present embodiment, after performing flattening of the low-frequency part spectrum, the spectrum after flattening is repeatedly used to encode the high-frequency part spectrum.
- FIG. 6 shows the configuration of the speech encoding apparatus according to Embodiment 1 of the present invention.
- LPC analysis section 101 performs LPC analysis of the input speech signal and calculates LPC coefficient (i) (l ⁇ i ⁇ NP).
- NP represents the order of the LPC coefficient, for example, 10 to 18 is selected.
- the calculated LPC coefficient is input to the LPC quantization unit 102.
- LPC quantization section 102 quantizes LPC coefficients.
- the LPC quantization unit 102 converts the LPC coefficients into LSP (Line Spectral Pair) parameters and then quantizes them from the viewpoint of quantization efficiency and stability determination.
- the LPC coefficients after quantization are encoded as the LPC decoding unit.
- the LPC decoding unit 103 decodes the quantized LPC coefficients to generate decoded LPC coefficients a (i) (1 ⁇ i ⁇ NP), and outputs them to the inverse filter unit 104.
- the inverse filter unit 104 configures an inverse filter using the decoded LPC coefficients, and passes the input speech signal through the inverse filter, thereby flattening the spectrum of the input speech signal.
- Equation (1) The inverse filter is expressed as Equation (1) or Equation (2).
- Equation (2) is an inverse filter when a resonance suppression coefficient ⁇ (0 ⁇ ⁇ 1) is used to control the degree of flattening.
- an output signal e (n) obtained when the audio signal s (n) is input to the inverse filter represented by Expression (2) is represented as Expression (4).
- the spectrum of the input audio signal is flattened by the inverse filter processing.
- the output signal of the inverse filter unit 104 (speech signal whose spectrum is flattened) is called a prediction residual signal.
- Frequency domain transform section 105 performs frequency analysis on the prediction residual signal output from inverse filter section 104, and obtains a residual spectrum as a transform coefficient.
- the frequency domain transform unit 105 transforms a time domain signal into a frequency domain signal using, for example, MDCT (Modified Discrete Cosine Transform).
- MDCT Modified Discrete Cosine Transform
- the residual spectrum is input to first layer code key unit 106 and second layer code key unit 108.
- the first layer code key unit 106 encodes the low frequency part of the residual spectrum using TwinVQ or the like, and converts the first layer code key data obtained by this code key into the first layer.
- the data is output to the decoding unit 10 7 and the multiplexing unit 109.
- First layer decoding section 107 decodes the first layer encoded data to generate a first layer decoded spectrum, and outputs the first layer decoded spectrum to second layer encoding section 108. First layer decoding section 107 outputs the first layer decoded spectrum before being converted to the time domain.
- Second layer coding unit 108 performs coding of the high frequency part of the residual spectrum using the first layer decoding spectrum obtained by first layer decoding unit 107, The second layer code data obtained by this code is output to multiplexing section 109.
- Second layer code key section 108 uses the first layer decoded spectrum as the internal state of the pitch filter, and estimates the high frequency part of the residual spectrum by the pitch filtering process. At this time, the second layer coding unit 108 estimates the high frequency part of the residual spectrum so as not to destroy the Har monitor structure of the spectrum.
- the second layer encoding unit 108 encodes the filter information of the pitch filter. Further, the second layer code key unit 108 estimates the high frequency part of the residual spectrum using the residual spectrum whose spectrum has been flattened.
- Multiplexing section 109 multiplexes the first layer encoded data, the second layer encoded data, and the LPC coefficient encoded data to generate a bit stream and output it.
- Figure 7 shows the second layer code The structure of the conversion unit 108 is shown.
- the first layer decoding spectrum Sl (k) (0 ⁇ k ⁇ FL) is input to the internal state setting unit 1081 from the first layer decoding unit 107.
- Internal state setting section 1081 sets the internal state of the filter used in filtering section 1082 using this first layer decoding vector.
- the pitch coefficient setting unit 1084 changes the pitch coefficient T little by little within a predetermined search range T to T in accordance with control from the search unit 1083, while filtering unit 10.
- Filtering section 1082 filters the first layer decoded spectrum based on the internal state of the filter set by internal state setting section 1081 and the pitch coefficient output from pitch coefficient setting section 1084. Then, an estimated value S 2 ′ (k) of the residual spectrum is calculated. Details of this filtering process will be described later.
- Search unit 1083 includes residual spectrum S2 (k) (0 ⁇ k ⁇ FH) input from frequency domain transform unit 105 and estimated value S2 ′ (k) of residual spectrum input from filtering unit 1082
- the similarity that is a parameter indicating the similarity between and is calculated. This similarity calculation process is performed every time the pitch coefficient T is given from the pitch coefficient setting unit 1084, and the pitch coefficient (optimum pitch coefficient) T ′ (T ⁇ ⁇ ) that maximizes the calculated similarity is obtained. Range) is multiplexing unit 1086
- search section 1083 outputs residual vector estimated value S2 ′ (k) generated using pitch coefficient T ′ to gain code section 1085.
- Gain sign key unit 1085 is a residual spectrum S2 input from frequency domain transform unit 105.
- Equation (5) BL (j) represents the minimum frequency of the j-th subband
- BH (j) represents the maximum frequency of the j-th subband.
- gain sign section 1085 calculates subband information B '(j) of estimated value S2' (k) of the residual spectrum according to equation (6), and changes for each subband.
- the quantity V (j) is calculated according to equation (7).
- the gain code unit 1085 encodes the variation amount V (j) and encodes the variation amount V (j) after encoding.
- the multiplexing unit 1086 multiplexes the optimum pitch coefficient T ′ input from the search unit 1083 and the index of the variation V (j) input from the gain encoding unit 1085 to generate the second layer code.
- the data is output to multiplexing section 109 as digitized data.
- FIG. 8 shows a state where filtering section 1082 generates a spectrum of band FL ⁇ k ⁇ FH using pitch coefficient T input from pitch coefficient setting section 1084.
- the spectrum in the entire frequency band (0 ⁇ k ⁇ FH) is called S (k) for convenience, and the filter function expressed by Eq. (8) is used.
- the first layer decoded spectrum Sl (k) is stored as the internal state of the filter in the band 0 ⁇ k ⁇ FL of S (k). On the other hand, in the band of S (k) where FL ⁇ k ⁇ FH, The estimated residual spectrum estimate S 2 ′ (k) is stored.
- a filtering process results in a spectrum S (k—T) having a frequency T lower than k, and a nearby spectrum S (k—T—i) that is separated by i around this spectrum. ) Is multiplied by a predetermined weighting coefficient ⁇ , and the spectrum ⁇ ⁇ S (kTi) is added, that is, the spectrum represented by equation (9) is substituted.
- the above filtering process is performed by clearing S (k) to zero each time in the range of FL ⁇ k ⁇ FH every time the pitch coefficient T is given from the pitch coefficient setting unit 1084. That is, S (k) is calculated and output to the search unit 1083 every time the pitch coefficient T changes.
- FIG. 9 shows the configuration of the speech decoding apparatus according to Embodiment 1 of the present invention.
- the speech decoding apparatus 200 receives a bit stream transmitted from the speech encoding apparatus 100 shown in FIG.
- demultiplexing section 201 converts the bit stream received from speech encoding apparatus 100 shown in FIG. 6 into first layer encoded data and second layer code.
- the first layer code key data is separated into the first layer decoding key unit 202
- the second layer code key data is transferred into the second layer decoding key unit 203
- the LPC coefficients are converted into LPC coefficients.
- Separating section 201 also outputs layer information (information indicating the power of which layer's code key data is included in the bitstream) to determining section 205.
- the first layer decoding unit 202 performs decoding processing using the first layer code key data to generate a first layer decoded spectrum, and sends it to the second layer decoding unit 203 and the determination unit 205. Output.
- Second layer decoding key section 203 generates a second layer decoded spectrum using the second layer code key data and the first layer decoded spectrum, and outputs the second layer decoded spectrum to determining section 205. Details of second layer decoding unit 203 will be described later.
- the LPC decoding unit 204 outputs the decoded LPC coefficient obtained by decoding the LPC coefficient encoded data to the synthesis filter unit 207.
- speech encoding apparatus 100 transmits both the first layer code key data and the second layer code key data in the bitstream, but the second layer is in the middle of the communication path. Code data may be discarded. Therefore, the determination unit 205 determines whether or not the second stream code key data is included in the bit stream based on the layer information. Then, when the second layer code key data is not included in the bitstream, the determination unit 205 does not generate the second layer decoded spectrum by the second layer decoding key unit 203. The spectrum is output to the time domain conversion unit 206. However, in this case, the determination unit 205 extends the order of the first layer decoded spectrum to FH in order to match the order with the decoded spectrum when the second layer code key data is included! FL—FH spectrum is output as 0. On the other hand, when both the first layer code key data and the second layer code key data are included in the bitstream, determination section 205 outputs the second layer decoded spectrum to time domain conversion section 206. .
- Time domain conversion section 206 converts the decoded spectrum input from determination section 205 into a signal in the time domain, generates a decoded residual signal, and outputs it to synthesis filter section 207.
- the synthesis filter unit 207 receives the decoded LPC coefficients a (i) (1) input from the LPC decoding unit 204.
- a synthesis filter is constructed using ⁇ i ⁇ NP).
- the synthesis filter H (z) is expressed as Expression (10) or Expression (11).
- ⁇ (0 ⁇ ⁇ 1) represents the resonance suppression coefficient.
- the decoded residual signal given by the time domain transforming unit 206 is set to e (n) as a synthesis file.
- the decoded signal s (n) to be output is expressed as in equation (12) when the synthesis filter expressed in equation (10) is used.
- FIG. 10 shows the configuration of second layer decoding section 203.
- the first layer decoded spectrum is input from the first layer decoding unit 202 to the internal state setting unit 2031.
- the internal state setting unit 2031 sets the internal state of the filter used in the filtering unit 2033 using the first layer decoded spectrum Sl (k).
- second layer code key data is input to separation section 2032 from separation section 201.
- Separating section 2032 separates the second layer code key data into information relating to the filtering coefficient (optimum pitch coefficient T ′) and information relating to the gain (index of variation V (j)), and relates to the filtering coefficient.
- Information is output to the filtering unit 2033 and gain related The information is output to the gain decoding unit 2034.
- Filtering section 2033 performs first layer decoded spectrum SI based on the internal state of the filter set by internal state setting section 2031 and pitch coefficient T input from separation section 2032.
- the gain decoding unit 2034 decodes the gain information input from the separation unit 2032 and changes the variation amount.
- the amount of variation V (j) obtained by signing V (j) is obtained.
- the spectrum adjustment unit 2035 adds the decoded spectrum S '(k) input from the filtering unit 2033 to the decoded subband variation V (j) input from the gain decoding unit 2034.
- speech decoding apparatus 200 can decode the bitstream transmitted from speech encoding apparatus 100 shown in FIG.
- the first layer is subjected to time domain code encoding (for example, CELP code encoding). Further, in the present embodiment, the spectrum of the first layer decoded signal is flattened using the decoded LPC coefficient obtained during the encoding process in the first layer.
- time domain code encoding for example, CELP code encoding
- FIG. 11 shows the configuration of the speech coding apparatus according to Embodiment 2 of the present invention.
- the same components as those in the first embodiment (FIG. 6) are denoted by the same reference numerals, and the description thereof is omitted.
- the downsampling unit 301 downsamples the sampling rate of the input audio signal, and converts the audio signal of the desired sampling rate into the first layer encoding unit 302. Output to.
- First layer code key section 302 is downsampled to a desired sampling rate.
- the audio signal is encoded to generate first layer encoded data, which is output to first layer decoding section 303 and multiplexing section 109.
- the first layer code key unit 302 uses, for example, a CELP code key.
- a CELP code key When the first layer code key unit 302 performs an LPC coefficient encoding process like a CELP code key, a decoded LPC coefficient can be generated during the encoding process. Therefore, first layer coding section 302 outputs the first layer decoded LPC coefficients generated during the coding process to inverse filter section 304.
- First layer decoding section 303 performs decoding processing using the first layer code key data to generate a first layer decoded signal, and outputs the first layer decoded signal to inverse filter section 304.
- Inverse filter section 304 forms an inverse filter using the first layer decoded LPC coefficients input from first layer code key section 302, and passes the first layer decoded signal through the inverse filter.
- the spectrum of the first layer decoded signal is flattened. Note that the details of the inverse filter are the same as those in the first embodiment, and a description thereof will be omitted.
- the output signal of the inverse filter unit 304 (first layer decoded signal with a flattened spectrum) is referred to as a first layer decoded residual signal.
- Frequency domain transform section 305 generates a first layer decoded spectrum by performing frequency analysis of the first layer decoded residual signal output from inverse filter section 304, and outputs it to second layer coding section 108. Output.
- the delay unit 306 is for giving a predetermined length of delay to the input audio signal.
- the magnitude of this delay is the time that occurs when the input audio signal passes through the downsampling unit 301, the first layer encoding unit 302, the first layer decoding unit 303, the inverse filter unit 304, and the frequency domain transform unit 305. Equivalent to the delay.
- the spectrum of the first layer decoded signal is decoded using the decoded LPC coefficient (first layer decoded LPC coefficient) obtained during the encoding process in the first layer. Since the smoothing is performed, the vector of the first layer decoded signal can be flattened using the information of the first layer code key data. Therefore, according to the present embodiment, the sign bit required for the LPC coefficient for flattening the spectrum of the first layer decoded signal becomes unnecessary, so that the flattening of the spectrum without increasing the amount of information is performed. You can do it.
- Figure 12 shows the 4 shows the configuration of a speech decoding apparatus according to the second preferred embodiment.
- the speech decoding apparatus 400 receives a bit stream transmitted from the speech encoding apparatus 300 shown in FIG.
- demultiplexing section 401 converts the bit stream received from speech encoding apparatus 300 shown in FIG. 11 into first layer encoded data and second layer encoded. Data and LPC coefficient encoded data, the first layer encoded data is transferred to the first layer decoding unit 402, the second layer encoded data is transferred to the second layer decoding unit 405, and the LP C coefficient The encoded data is output to the LPC decoding unit 407. Separating section 401 outputs layer information (information indicating the power of which layer's code key data is included in the bitstream) to determining section 413.
- First layer decoding section 402 performs decoding processing using the first layer code key data and performs the first decoding process.
- a one-layer decoded signal is generated and output to inverse filter section 403 and upsampling section 410.
- the first layer decoding unit 402 also generates a first layer decoding LP generated during the decoding process.
- the C coefficient is output to the inverse filter unit 403.
- Up-sampling section 410 up-samples the sampling rate of the first layer decoded signal and outputs it to low-pass filter section 411 and determination section 413 with the same sampling rate as the input audio signal in FIG.
- the low-pass filter unit 411 has a pass band set to 0—FL, passes only the frequency band 0—FL of the first layer decoded signal after upsampling, and generates a low pass signal. Output to 412.
- Inverse filter section 403 forms an inverse filter using the first layer decoded LPC coefficients input from first layer decoding section 402, and passes the first layer decoded signal through the inverse filter. Thus, a first layer decoded residual signal is generated and output to frequency domain transform section 404.
- Frequency domain transform section 404 performs frequency analysis on the first layer decoded residual signal output from inverse filter section 403 to generate a first layer decoded spectrum, and second layer decoding section 40
- Second layer decoding key section 405 generates a second layer decoded spectrum using the second layer code key data and the first layer decoded spectrum, and outputs the second layer decoded spectrum to time domain converting section 406.
- the details of second layer decoding key unit 405 are described in Second layer decoding key unit 203 of Embodiment 1.
- Time domain conversion section 406 converts the second layer decoded spectrum into a time domain signal
- a two-layer decoded residual signal is generated and output to synthesis filter section 408.
- the LPC decoding unit 407 generates the decoded LPC coefficient obtained by decoding the LPC coefficient, and the synthesis filter unit 4
- the synthesis filter unit 408 forms a synthesis filter using the decoded LPC coefficients input from the LPC decoding unit 407. Note that the details of the synthesis filter unit 408 are the same as those of the synthesis filter unit 207 (FIG. 9) of the first embodiment, and a description thereof will be omitted.
- the synthesis filter unit 408 generates the second layer synthesized signal s (n) in the same manner as in the first embodiment, and outputs it to the high-pass filter unit 409.
- the no-pass filter unit 409 is set to the passband force FL-FH, generates only the frequency band FL-FH of the second layer composite signal, and generates a high-frequency signal. Output to 412.
- Adder 412 generates a second layer decoded signal by adding the low-frequency signal and the high-frequency signal, and outputs the second-layer decoded signal to determination unit 413.
- determination section 413 determines whether or not the second layer code key data is included in the bitstream, and determines the first layer decoded signal or the second layer. One of the decoded signals is selected and output as a decoded signal.
- the determination unit 413 outputs the first layer decoded signal when the second stream code data is not included in the bit stream, and the first layer code data and the second layer code data in the bit stream. If both are included, the second layer decoded signal is output.
- the low-pass filter unit 411 and the high-pass filter unit 409 are used to mitigate the influence on each other between the low-frequency signal and the high-frequency signal. Therefore, if the influence between the low-frequency signal and the high-frequency signal is small, the speech decoding apparatus 400 may be configured not to use these filters. When these filters are not used, the calculation amount can be reduced because the calculation related to filtering is unnecessary.
- speech decoding apparatus 400 transmits from speech encoding apparatus 300 shown in FIG.
- the received bitstream can be decoded.
- the spectrum of the first layer sound source signal is flattened in the same way as the spectrum of the prediction residual signal obtained by removing the influence of the spectrum envelope from the input speech signal. Therefore, in the present embodiment, the first layer excitation signal obtained during the coding process in the first layer is used as the signal whose spectrum is flattened (that is, the first layer decoded residual signal in the second embodiment). ) Treat it as if it were.
- FIG. 13 shows the configuration of the speech encoding apparatus according to Embodiment 3 of the present invention.
- the same components as those of the second embodiment (FIG. 11) are denoted by the same reference numerals, and the description thereof is omitted.
- First layer coding unit 501 performs coding processing on the audio signal down-sampled to a desired sampling rate, generates first layer coded data, and outputs the first layer coded data to multiplexing unit 109 .
- the first layer code key unit 501 uses, for example, a CELP code key.
- first layer encoding unit 501 outputs the first layer excitation signal generated during the encoding process to frequency domain conversion unit 502.
- the sound source signal refers to a signal input to a synthesis filter (or auditory weighted synthesis filter) in the first layer coding unit 501 that performs CELP coding, and is also called a drive signal. .
- Frequency domain transform section 502 performs frequency analysis of the first layer excitation signal to generate a first layer decoded spectrum, and outputs the first layer decoded spectrum to second layer coding section 108.
- delay section 503 has the same magnitude as the time delay that occurs when the input audio signal passes through downsampling section 301, first layer coding section 501 and frequency domain transform section 502. .
- the first layer decoding unit 303 and the inverse filter unit 304 are not required compared to the second embodiment (FIG. 11), thereby reducing the amount of calculation. Can do.
- FIG. 14 shows the configuration of the speech decoding apparatus according to Embodiment 3 of the present invention.
- the speech decoding apparatus 600 receives a bit stream transmitted from the speech encoding apparatus 500 shown in FIG.
- the same components as those in Embodiment 2 are denoted by the same reference numerals. The description is omitted.
- First layer decoding section 601 performs a decoding process using the first layer code key data to generate a first layer decoded signal, and outputs the first layer decoded signal to upsampling section 410. Further, first layer decoding section 601 outputs the first layer sound source signal generated during the decoding process to frequency domain transform section 602.
- Frequency domain transform section 602 generates a first layer decoded spectrum by performing frequency analysis of the first layer excitation signal, and outputs the first layer decoded spectrum to second layer decoding section 405.
- speech decoding apparatus 600 can decode the bitstream transmitted from speech encoding apparatus 500 shown in FIG.
- the spectrums of the first layer decoded signal and the input speech signal are flattened using the second layer decoded LPC coefficient obtained in the second layer.
- FIG. 15 shows the configuration of speech coding apparatus 700 according to Embodiment 4 of the present invention.
- the same components as those of the second embodiment (FIG. 11) are denoted by the same reference numerals and description thereof is omitted.
- First layer coding section 701 performs coding processing on the audio signal down-sampled to a desired sampling rate to generate first layer coded data, and the first layer decoding section Output to 702 and multiplexing section 109.
- the first layer code key unit 701 uses, for example, a CELP code key.
- First layer decoding section 702 performs a decoding process using the first layer code key data, generates a first layer decoded signal, and outputs the first layer decoded signal to upsampling section 703.
- Up-sampling section 703 up-samples the sampling rate of the first layer decoded signal to be the same as the sampling rate of the input audio signal, and outputs it to inverse filter section 704.
- the inverse filter unit 704 receives the decoded LPC coefficients from the LPC decoding unit 103. Inverse filter section 704 constructs an inverse filter using the decoded LPC coefficients, and passes the first layer decoded signal after upsampling through the inverse filter, thereby flattening the spectrum of the first layer decoded signal.
- the inverse filter unit 70 The output signal of 4 (first layer decoded signal with flattened outer edges) is called the first layer decoded residual signal.
- Frequency domain transform section 705 performs frequency analysis of the first layer decoded residual signal output from inverse filter section 704 to generate a first layer decoded spectrum, and outputs it to second layer encoding section 108. Output.
- delay section 706 is such that the input audio signal is downsampling section 301, first layer encoding section 701, first layer decoding section 702, upsampling section 703, and inverse filter. It is the same value as the time delay that occurs when passing through the unit 704 and the frequency domain transform unit 705.
- FIG. 16 shows the configuration of the speech decoding apparatus according to Embodiment 4 of the present invention.
- the speech decoding apparatus 800 receives a bit stream transmitted from the speech encoding apparatus 700 shown in FIG.
- the same components as those of the second embodiment are denoted by the same reference numerals, and description thereof is omitted.
- First layer decoding section 801 performs decoding processing using the first layer code key data to generate a first layer decoded signal, and outputs the first layer decoded signal to upsampling section 802.
- Up-sampling section 802 up-samples the sampling rate of the first layer decoded signal to be the same as the sampling rate of the input audio signal in FIG. 15, and outputs the same to inverse filter section 803 and determination section 413.
- the inverse filter unit 803 receives the decoded LPC coefficient from the LPC decoding unit 407.
- Inverse filter section 803 forms an inverse filter using the decoded LPC coefficients, passes the first layer decoded signal after upsampling through this inverse filter, flattens the spectrum of the first layer decoded signal,
- the layer decoding residual signal is output to frequency domain transform section 804.
- Frequency domain transform section 804 performs frequency analysis of the first layer decoded residual signal output from inverse filter section 803 to generate a first layer decoded spectrum, and second layer decoding section 40 5 Output to.
- speech decoding apparatus 800 transmits from speech encoding apparatus 700 shown in FIG.
- the received bitstream can be decoded.
- each of the first layer decoded signal and the input speech signal is used in the speech coding apparatus using the second layer decoded LPC coefficient obtained in the second layer.
- the speech decoding apparatus can obtain the first layer decoded spectrum by using LPC coefficients common to the speech encoding apparatus. Therefore, according to the present embodiment, it is not necessary for the speech decoding apparatus to perform separate processing for the low frequency part and the high frequency part as in Embodiments 2 and 3 when generating a decoded signal.
- a low-pass filter and a noise-pass filter are not required, the device configuration is simplified, and the amount of calculation related to the filtering process can be reduced.
- the degree of flatness is controlled by adaptively changing the resonance suppression coefficient of the inverse filter that performs spectral flatness according to the characteristics of the input audio signal.
- FIG. 17 shows the configuration of speech encoding apparatus 900 according to Embodiment 5 of the present invention.
- the same components as those in Embodiment 4 (FIG. 15) are denoted by the same reference numerals, and description thereof is omitted.
- inverse filter sections 904 and 905 are expressed by equation (2).
- the feature amount analysis unit 901 analyzes the input speech signal, calculates the feature amount, and outputs the feature amount to the feature amount code unit 902.
- a parameter representing the intensity of the speech spectrum due to resonance is used.
- the distance between adjacent LSP parameters is used.
- the smaller the distance the greater the energy of the spectrum corresponding to the resonance frequency, the greater the degree of resonance.
- the resonance suppression coefficient ⁇ (0 ⁇ ⁇ 1) is set to be small in the speech section where resonance is strong, and the level of flattening is weakened. As a result, excessive attenuation of the spectrum in the vicinity of the resonance frequency due to the flattening process can be prevented, and deterioration of voice quality can be suppressed.
- the feature amount code key unit 902 encodes the feature amount input from the feature amount analysis unit 901 to generate feature amount code key data, and the feature amount decoding key unit 903 and the multiplexing unit 906 Output to The feature amount decoding unit 903 decodes the feature amount using the feature amount code key data, determines the resonance suppression coefficient ⁇ used in the inverse filter units 904 and 905 according to the decoded feature amount, and performs inverse processing. Output to the filter unit 904, 905.
- the resonance suppression coefficient ⁇ is increased as the periodicity of the input speech signal is stronger, and the resonance suppression coefficient ⁇ is decreased as the periodicity of the input speech signal is weaker. .
- the resonance suppression coefficient ⁇ By controlling the resonance suppression coefficient ⁇ in this manner, the flatness of the spectrum is more strongly performed in the voiced portion, and the degree of the flatness of the spectrum is weakened in the unvoiced portion. Therefore, it is possible to prevent an excessive spectrum flatness in the unvoiced portion, and to suppress deterioration in voice quality.
- the inverse filter units 904 and 905 perform inverse filter processing according to the equation (2) according to the resonance suppression coefficient y controlled by the feature amount decoding unit 903.
- Multiplexing section 906 multiplexes the first layer encoded data, the second layer encoded data, the LPC coefficient, and the feature amount code key data, generates a bit stream, and outputs it.
- delay level of delay section 907 is such that the input audio signal is downsampling section 301, first layer coding section 701, first layer decoding section 702, upsampling section 703, and inverse filter. It is the same value as the time delay that occurs when the signal passes through the part 905 and the frequency domain conversion part 705.
- FIG. 18 shows the configuration of the speech decoding apparatus according to Embodiment 5 of the present invention.
- This speech decoding apparatus 1000 receives the bit stream transmitted from the speech encoding apparatus 900 shown in FIG.
- the same components as those in Embodiment 4 are denoted by the same reference numerals, and description thereof is omitted.
- inverse filter section 1003 is expressed by equation (2).
- Separation section 1001 converts the bit stream received from speech encoding apparatus 900 shown in FIG. 17 into first layer encoded data, second layer encoded data, LPC coefficient encoded data, and feature quantity.
- the first layer code key data is separated into the code layer data
- the second layer code key data is transferred to the second layer decoding key unit 405, and the LPC coefficients are transferred to the LPC.
- the decoding unit 407 outputs the feature amount code key data to the feature amount decoding key unit 1002.
- the separating unit 1001 also determines layer information (which layer code data is included in the bitstream). Information) is output to the determination unit 413.
- the feature value decoding unit 1002 decodes the feature value using the feature value encoded data, and the inverse filter unit 1003 performs the decoding according to the decoded feature value.
- the resonance suppression coefficient to be used is determined 0 and output to the inverse filter unit 1003.
- the inverse filter unit 1003 performs inverse filter processing according to the equation (2) according to the resonance suppression coefficient ⁇ controlled by the feature value decoding unit 1002.
- speech decoding apparatus 1000 can decode the bitstream transmitted from speech encoding apparatus 900 shown in FIG.
- the LPC quantization unit 102 (FIG. 17) quantizes the LPC coefficients after converting them into LSP parameters as described above. Therefore, in the present embodiment, the configuration of the speech encoding apparatus may be as shown in FIG. That is, in speech coding apparatus 1100 shown in FIG. 19, without providing feature quantity analysis section 901, LPC quantization section 102 calculates the distance between LSP parameters and outputs it to feature quantity code section 902. .
- LPC quantization section 102 when LPC quantization section 102 generates decoded LSP parameters, the configuration of the speech encoding apparatus may be as shown in FIG. That is, in speech coding apparatus 1300 shown in FIG. 20, without providing feature quantity analysis section 901, feature quantity coding section 902, and feature quantity decoding section 903, LPC quantization section 102 performs decoding LSP parameters. Is calculated, and the distance between the decoded LSP parameters is calculated and output to the inverse filter sections 904 and 905.
- FIG. 21 shows the configuration of speech decoding apparatus 1400 that decodes the bitstream transmitted from speech encoding apparatus 1300 shown in FIG.
- the LPC decoding unit 407 further generates a decoding LSP parameter for the decoding LPC coefficient power, calculates a distance between the decoding LSP parameters, and outputs the calculated distance to the inverse filter unit 1003.
- the dynamic range of the low-frequency spectrum that is the source of replication (the ratio of the maximum and minimum values of the spectrum amplitude) is greater than the dynamic range of the high-frequency spectrum that is the target of replication.
- a spectrum with such an excessive peak is In the decoded signal obtained by converting the signal into a band, noise that sounds like a bell is generated, and as a result, the subjective quality deteriorates.
- a large quantization error occurs when the number of encoding candidates is not sufficient, that is, when the bit rate is low. If such a large quantization error occurs, the dynamic range of the low-frequency spectrum is not sufficiently adjusted due to the quantization error, resulting in quality degradation. In particular, if an encoding candidate representing a dynamic range larger than the dynamic range of the high-frequency spectrum is selected, an excessive peak is likely to occur in the high-frequency spectrum, and quality degradation becomes noticeable. Sometimes.
- the second layer code is applied when the technique for bringing the dynamic range of the low band spectrum close to the dynamic range of the high band vector is applied to each of the above embodiments.
- the key unit 108 codes the deformation information, the code key candidate having a small dynamic range is more easily selected than the code key candidate having a large dynamic range.
- FIG.22 shows the configuration of second layer code key section 108 according to Embodiment 6 of the present invention.
- the same components as those in Embodiment 1 (FIG. 7) are denoted by the same reference numerals, and description thereof is omitted.
- the spectrum transformation unit 1087 receives the first layer decoded spectrum Sl (k) (0 ⁇ k ⁇ FL) from the first layer decoding unit 107.
- the residual spectrum S2 (k) (0 ⁇ k ⁇ FH) is input from the frequency domain converter 105.
- the spectral transformation unit 1087 adjusts the dynamic range of the decoded spectrum SI (k) to an appropriate dynamic level. Therefore, the dynamic range of the decoded spectrum SI (k) is changed by modifying the decoded spectrum SI (k).
- spectrum modifying section 1087 encodes the deformation information representing how the decoded spectrum SI (k) has been deformed, and outputs it to multiplexing section 1086. Further, spectrum modifying section 1087 outputs the decoded spectrum (modified decoding spectrum) Sl ′ (j, k) after modification to internal state setting section 1081.
- the configuration of spectrum deforming section 1087 is shown in FIG.
- the spectrum transforming unit 1087 transforms the decoding vector SI (k) to change the dynamic range of the decoding spectrum SI (k) to the high frequency part (FL ⁇ k ⁇ FH) of the residual spectrum S2 (k). Move closer to the range. Further, the spectrum modification unit 1087 encodes the deformation information and outputs it.
- deformed spectrum generating section 1101 generates deformed decoded spectrum SI ′ (j, k) by deforming decoded spectrum SI (k), and subband energy calculating section 1102 Output to.
- j is an index for identifying each code key candidate (each modification information) of the codebook 1111.
- each coding candidate (each modification) included in the codebook 1111 is identified.
- Information is used to transform the decoded spectrum SI (k).
- an example is given in which the spectrum is transformed using an exponential function.
- each coding candidate a (j) is assumed to be in the range of 0 ⁇ a (j) ⁇ 1. Therefore, the modified decoded spectrum Sl ′ (j, k) is expressed as in equation (15).
- sign () represents a function that returns a positive or negative sign. Therefore, the dynamic range of the modified decoded spectrum S I ′ (j, k) decreases as the encoding candidate a (j) takes a value close to 0.
- Subband energy calculation section 1102 divides the frequency band of modified decoded spectrum SI '(j, k) into a plurality of subbands, and average energy of each subband (subband energy equal) P 1 (j, ⁇ ) is obtained and output to the variance calculation unit 1103.
- ⁇ represents the subband number
- variance calculation unit 1103 to represent the degree of variation of subband energy PI (j, n), obtaining the variance ⁇ l (j) 2 of subband energy Pl (j, n). Then, the variance calculation unit 110 3 outputs the variance ⁇ 1 (j) 2 in the sign y candidate (deformation information) j to the subtraction unit 1106.
- the subband energy calculation unit 1104 divides the high frequency part of the residual spectrum S2 (k) into a plurality of subbands, and the average energy (subband energy) P of each subband.
- the variance calculation unit 1105 obtains the variance ⁇ 2 2 of the subband energy P2 (n) in order to express the degree of variation of the subband energy P2 (n), and outputs it to the subtraction unit 1106.
- subtracting section 1106 subtracts variance ⁇ 1 (j) 2 from the variance sigma 2 2, and outputs an error signal obtained by this subtraction to deciding section 1107 and weighted error calculating section 1108.
- Determination unit 1107 determines the sign (positive or negative) of the error signal, and determines the weight (weight) to be given to weighted error calculation unit 1108 based on the determination result.
- the determination unit 1107 selects w as a weight when the sign of the error signal is positive, and w as a weight when it is negative.
- the weighted error calculation unit 1108 first calculates the square value of the error signal input from the subtraction unit 1106, and then uses the weight w (w or w) input from the determination unit 1107 as the error signal.
- the weighted square error E is calculated by multiplying the square value of, and output to the search unit 1109.
- the weighted square error E is expressed as in Eq. (17).
- Search section 1109 controls codebook 1111 to sequentially output code candidate correction (deformation information) stored in codebook 1111 to modified spectrum generation section 1101, and weighted square error E is calculated. Search for the smallest encoding candidate (transformation information). Then, the search unit 1109 uses the index j of the encoding candidate that minimizes the weighted square error E as the optimal deformation information.
- the modified spectrum generation unit 1110 transforms the decoded spectrum SI (k) to obtain the optimal deformation information j.
- FIG. 24 shows the configuration of second layer decoding section 203 according to Embodiment 6 of the present invention.
- the same components as those in Embodiment 1 are denoted by the same reference numerals, and description thereof is omitted.
- modified spectrum generation section 2036 is input from first layer decoding section 202 based on optimal modified information j input from separation section 2032.
- the first layer decoded spectrum SI (k) is modified to generate a modified decoded spectrum SI ′ (j, k),
- the modified spectrum generation unit 2036 is provided corresponding to the modified spectrum generation unit 1110 on the speech coding apparatus side, and performs the same processing as the modified spectrum generation unit 1110.
- the case where the error signal is positive is a case where the degree of variation of the modified decoded spectrum S1 ′ is smaller than the degree of variation of the residual spectrum S2, which is the target value. That is, this corresponds to the dynamic range of the modified decoded spectrum S1 ′ generated on the speech decoding apparatus side being smaller than the dynamic range of the residual spectrum S2.
- the case where the error signal is negative is a case where the degree of variation of the modified decoded spectrum S1 ′ is larger than the degree of variation of the residual spectrum S2, which is the target value. That is, this corresponds to the dynamic range of the modified decoded spectrum S1 ′ generated on the speech decoding apparatus side becoming larger than the dynamic range of the residual spectrum S2.
- Code candidate candidates that generate a modified decoding spectrum S1 ′ having a dynamic range smaller than the dynamic range of the vector S2 are easily selected. That is, the code key candidate that suppresses the dynamic range is preferentially selected. Therefore, it is The frequency with which the dynamic range of the estimated spectrum formed becomes larger than the dynamic range of the high frequency part of the residual spectrum decreases.
- the spectral deformation method using an exponential function is taken as an example.
- the spectral deformation method is not limited to this.
- other spectral deformation methods such as spectral deformation using a logarithmic function. You can use
- FIG. 25 shows the configuration of spectrum deforming section 1087 according to Embodiment 7 of the present invention.
- the same components as those in Embodiment 6 (FIG. 23) are denoted by the same reference numerals and description thereof is omitted.
- the degree-of-variation calculating unit 1112-1 calculates the degree of dispersion of the decoded spectrum SI (k), the distribution power of the low-frequency value of the decoded spectrum SI (k).
- the threshold value setting unit 1113-1 and 1113-2 output the result.
- the degree of variation is the standard deviation ⁇ 1 of the decoded spectrum SI (k).
- the threshold setting unit 1113-1 obtains the first threshold TH1 using the standard deviation ⁇ 1 and outputs the first threshold TH1 to the average spectrum calculation unit 1114-1 and the modified spectrum generation unit 1110.
- the first threshold value TH1 is a threshold value for specifying a spectrum having a relatively large amplitude in the decoded spectrum SI (k), and a value obtained by multiplying the standard deviation ⁇ 1 by a predetermined constant a is used.
- the threshold setting unit 1113-2 obtains the second threshold TH2 using the standard deviation ⁇ 1 and outputs the second threshold TH2 to the average spectrum calculation unit 1114-2 and the modified spectrum generation unit 1110.
- the second threshold value ⁇ 2 is a threshold value for identifying a spectrum having a relatively small amplitude in the low frequency part of the decoded spectrum SI (k), and the standard deviation ⁇ 1 is set to a predetermined constant b ( ⁇ a ) Is used.
- Average spectrum calculation section 1114-1 obtains an average amplitude value (hereinafter referred to as a first average value) of a spectrum having an amplitude larger than first threshold TH1, and outputs the average amplitude value to modified vector calculation section 1115. Specifically, the average spectrum calculation unit 1114-1 adds the first threshold value TH1 to the average value ml of the decoded spectrum SI (k), and the value of the spectrum in the low frequency part of the decoded spectrum Sl (k) ( (Step 1) o Next, the average spectrum calculation unit 1114-1 determines the spectrum of the lower part of the decoded spectrum Sl (k).
- a first average value hereinafter referred to as a first average value of a spectrum having an amplitude larger than first threshold TH1
- the average spectrum calculation unit 1114-1 adds the first threshold value TH1 to the average value ml of the decoded spectrum SI (k), and the value of the spectrum in the low frequency part of the decoded spectrum Sl (k) ( (Step
- the value of the tuttle is compared with the value (ml -TH1) obtained by subtracting the first threshold TH1 from the average value ml of the decoded spectrum SI (k), and a spectrum having a value smaller than this value is specified (step 2). Then, average spectrum calculation section 1114-1 obtains the average value of the amplitude of the spectrum obtained in both step 1 and step 2, and outputs it to modified vector calculation section 1115.
- Average spectrum calculation section 1114-2 calculates an average amplitude value (hereinafter referred to as a second average value) of a spectrum having an amplitude smaller than second threshold TH2, and outputs the average amplitude value to modified vector calculation section 1115. Specifically, the average spectrum calculation unit 1114-2 calculates the value of the spectrum in the low frequency part of the decoded spectrum Sl (k) by adding the second threshold TH2 to the average value ml of the decoded spectrum SI (k) ( (Step 1) o Next, the average spectrum calculation unit 1114-2 determines the spectrum of the low frequency part of the decoded spectrum Sl (k).
- the value of the tuttle is compared with a value (ml-TH2) obtained by subtracting the second threshold TH2 from the average value ml of the decoded spectrum SI (k), and a spectrum having a value larger than this value is specified (step 2). Then, average spectrum calculation section 1114-2 calculates the average value of the amplitude of the spectrum obtained in both step 1 and step 2, and outputs it to modified vector calculation section 1115.
- the degree-of-variation calculation unit 1112-2 calculates the distribution power of the high-frequency part of the residual spectrum S2 (k) and calculates the degree of variation of the residual spectrum S2 (k). , 1113-4. Specifically, the degree of variation is the standard deviation ⁇ 2 of the residual spectrum S2 (k).
- the threshold value setting unit 1113-3 obtains the third threshold value TH3 using the standard deviation ⁇ 2 and outputs it to the average spectrum calculation unit 1114-3.
- the third threshold value ⁇ 3 is a threshold value for specifying a spectrum having a relatively large amplitude in the high frequency part of the residual spectrum S2 (k), and the standard deviation ⁇ 2 is multiplied by a predetermined constant c. Values are used.
- the threshold value setting unit 1113-4 obtains the fourth threshold value ⁇ 4 using the standard deviation ⁇ 2 and outputs the fourth threshold value ⁇ 4 to the average spectrum calculation unit 1114-4.
- the fourth threshold value ⁇ 4 is a threshold value for specifying a spectrum having a relatively small amplitude in the high frequency part of the residual spectrum S2 (k), and a predetermined constant d ( ⁇ The value multiplied by c) is used.
- Average spectrum calculation section 1114-3 calculates an average amplitude value (hereinafter referred to as a third average value) of a spectrum having an amplitude larger than third threshold TH3, and outputs the average amplitude value to modified vector calculation section 1115. Specifically, the average spectrum calculation unit 1114-3 added the third threshold TH3 to the average value m3 of the residual spectrum S2 (k) and the spectrum value of the high frequency part of the residual spectrum S2 (k).
- the average spectrum calculation unit 1114-3 uses the high frequency part of the residual spectrum S2 (k) Is compared with the average value m3 of the residual spectrum S2 (k) minus the third threshold TH3 (m3-TH3), and a spectrum having a value smaller than this value is identified ( Step 2). Then, average spectrum calculation section 1114-3 obtains the average value of the amplitudes of the spectra obtained in both step 1 and step 2, and outputs it to modified vector calculation section 1115.
- Average spectrum calculation section 1114-4 calculates the amplitude force and average amplitude value of the spectrum (hereinafter referred to as the fourth average value) from fourth threshold TH4, and outputs the result to modified vector calculation section 1115. Specifically, the average spectrum calculation unit 1114-4 added the fourth threshold value TH4 to the average value m3 of the residual spectrum S2 (k) and the spectrum value of the high frequency part of the residual spectrum S2 (k).
- the average spectrum calculation unit 1114—4 Is compared with the average value m3 of the residual spectrum S2 (k) minus the fourth threshold TH4 (m3-TH4), and a spectrum having a value larger than this value is identified ( Step 2).
- the average spectrum calculation unit 1114-4 is obtained in both step 1 and step 2.
- the average value of the amplitude of the obtained spectrum is obtained and output to the deformation vector calculation unit 1115.
- the deformation vector calculation unit 1115 calculates the deformation vector as follows using the first average value, the second average value, the third average value, and the fourth average value.
- the deformation vector calculation unit 1115 performs the ratio between the third average value and the first average value (hereinafter referred to as the first gain) and the ratio between the fourth average value and the second average value (hereinafter referred to as the following). (Referred to as the second gain) and outputs the first gain and the second gain to the subtraction unit 1106 as modified vectors.
- the subtraction unit 1106 subtracts the code vector candidates belonging to the modified vector codebook 1116 from the modified vector g (i), and sends an error signal obtained by this subtraction to the determining unit 1107 and the weighted error calculating unit 1108. Output.
- the encoding candidate is represented as v (j, i).
- j is an index for identifying each coding candidate (each modification information) of the modified vector codebook 1116.
- the determination unit 1107 determines the sign (positive or negative) of the error signal and, based on the determination result, determines the weight (weight) to be given to the weighted error calculation unit 1108 as the first gain g (l), second Determined for each gain g (2). For the first gain g (l), the determination unit 1107 selects w as a weight when the sign of the error signal is positive, and w as a weight when it is negative, and gives a weighted error.
- the determination unit 1107 selects w as a weight when the sign of the error signal is positive and w as a weight when it is negative.
- the weighted error calculation unit 1108 first calculates the square value of the error signal input from the subtraction unit 1106, and then calculates the square value of the error signal and the first gain g (l), the first gain. 2 Gain g (2) For each weight (2), the product sum with the weight w (w or w) input from the judgment unit 1107
- the difference E is calculated and output to the search unit 1109.
- the weighted square error E is expressed as in Eq. (19).
- Search section 1109 controls modified vector codebook 1116 to sequentially output code candidate candidates (modified information) stored in modified vector codebook 1116 to subtracting section 1106, and weighted square error E Search for the encoding candidate (transformation information) that minimizes.
- the search unit 1109 uses the index j candidate index j that minimizes the weighted square error E as the optimal deformation information.
- the modified spectrum generation unit 1110 transforms the decoded spectrum SI (k) using the first threshold TH1, the second threshold TH2, and the optimal deformation information j, and generates a modified recovery opt opt corresponding to the optimal deformation information j.
- Signal spectrum SI ′ (j, k) is generated and output to internal state setting section 1081.
- the deformation spectrum generation unit 1110 first uses the optimal deformation information j to calculate the third average value and the first value.
- a decoded value with a ratio to the average value (hereinafter referred to as a decoded first gain) and a decoded value with a ratio between the fourth average value and the second average value (hereinafter referred to as a decoded second gain) are generated.
- the modified spectrum generation unit 1110 compares the amplitude value of the decoded spectrum SI (k) with the first threshold THI, identifies a spectrum having an amplitude larger than the first threshold TH1, and detects these scans.
- the vector is multiplied by the first decoding gain to generate a modified decoded spectrum Sl '(j, k).
- the modified spectrum generation unit 1110 compares the amplitude value of the decoded spectrum SI (k) with the second threshold TH2, identifies the spectrum having an amplitude smaller than the second threshold TH2, and multiplies these spectra by the decoding second gain. To generate a modified decoded spectrum S 1 ′ (j, k).
- the modified spectrum generation unit 1110 uses a gain having an intermediate value between the decoded first gain and the decoded second gain. For example, the modified spectrum generation unit 1110 obtains a decoding gain y corresponding to an amplitude X from a characteristic curve based on the first decoding gain, the second decoding gain, the first threshold TH1, and the second threshold TH2. The gain is multiplied by the amplitude of the decoded spectrum Sl (k). That is, the decoding gain y is a linear interpolation value of the decoding first gain and the decoding second gain.
- FIG. 26 shows the configuration of spectrum deforming section 1087 according to Embodiment 8 of the present invention.
- the same components as those in Embodiment 6 (FIG. 23) are denoted by the same reference numerals, and description thereof is omitted.
- the correcting unit 1117 includes the variance calculating unit 110.
- the variance ⁇ 2 2 is input from 5.
- Correction unit 1117 performs correction processing to reduce the value of variance ⁇ 2 2 and outputs the result to subtraction unit 1106. Specifically, the correcting unit 1117 multiplies the variance ⁇ 2 2 by a value that is greater than or equal to 0 and less than 1.
- Subtraction unit 1106 subtracts variance ⁇ 1 (j) 2 from the variance after correction processing, and outputs an error signal obtained by this subtraction to error calculation unit 1118.
- the error calculation unit 1118 calculates the square value (square error) of the error signal input from the subtraction unit 1106 and outputs it to the search unit 1109.
- Search section 1109 controls codebook 1111 to sequentially output code candidate correction (modified information) stored in codebook 1111 to modified spectrum generation section 1101 to minimize the square error. Search for candidate sign (deformation information). Then, search section 1109 uses modified spectrum generation section 1110 and index j of the encoding candidate that minimizes the square error as the optimal deformation information.
- search unit 1109 uses the variance after correction processing, that is, the code with the target value as the variance with a smaller value.
- the search for conversion candidates is performed. Therefore, since the speech decoding apparatus can suppress the dynamic range of the estimated spectrum, it is possible to further reduce the frequency of occurrence of excessive peaks as described above.
- the value to be multiplied by variance sigma 2 2 in accordance with the characteristic of the input speech signal may be variable I spoon.
- the correction unit 1117 increases the value multiplied by the variance ⁇ 2 2 when the pitch periodicity of the input audio signal is weak (for example, when the pitch gain is small), and increases the pitch periodicity of the input audio signal.
- the value multiplied by ⁇ 2 2 may be a small value.
- FIG. 27 shows the configuration of spectrum deforming section 1087 according to Embodiment 9 of the present invention.
- the same components as those in Embodiment 7 (FIG. 25) are denoted by the same reference numerals, and description thereof is omitted.
- the modification vector g (i) is input from the transformation vector calculation unit 1115 to the modification unit 1117.
- Correction unit 1117 performs at least one of correction processing for reducing the value of first gain g (l) and correction processing for increasing the value of second gain g (2), and outputs the result to subtraction unit 1106. Specifically, the correction unit 1117 multiplies the first gain g (l) by a value between 0 and 1 and multiplies the second gain g (2) by a value greater than 1.
- Subtracting section 1106 subtracts encoding candidates belonging to modified vector codebook 1116 from the modified vector after correction processing, and outputs an error signal obtained by this subtraction to error calculating section 1118.
- the error calculation unit 1118 calculates the square value (square error) of the error signal input from the subtraction unit 1106 and outputs it to the search unit 1109.
- Search section 1109 controls modified vector codebook 1116 to sequentially output code key candidates (modified information) stored in modified vector codebook 1116 to subtracting section 1106 so that the square error is minimized. Search for candidate ⁇ (deformation information). Then, the search unit 1109 calculates the index j of the encoding candidate that minimizes the square error.
- search unit 11009 uses the modified vector after the correction process, that is, the deformation vector that decreases the dynamic range, as the target value.
- the search for the candidate sign i is performed. Therefore, since the speech decoding apparatus can suppress the dynamic range of the estimated spectrum, the occurrence frequency of excessive peaks as described above can be further reduced.
- the correction unit 1117 may change the value to be multiplied by the deformation vector g (i) according to the characteristics of the input audio signal. Such adaptation makes it difficult to generate an excessively large spectrum peak only for a signal having a strong pitch periodicity (for example, a vowel part), as in the eighth embodiment, and as a result, the auditory sound quality can be improved. it can.
- FIG. 28 shows the configuration of second layer encoding section 108 according to Embodiment 10 of the present invention.
- the same components as in Embodiment 6 (FIG. 22) are assigned the same reference numerals and explanations thereof are omitted.
- residual spectrum S 2 (k) is input from frequency domain transforming section 105 to spectral transforming section 1088, and residual spectrum is transmitted from searching section 1083.
- the estimated value of tuttle (estimated residual spectrum) S2 '(k) is input.
- the spectrum modification unit 1088 refers to the dynamic range of the high-frequency part of the residual spectrum S2 (k) and transforms the estimated residual spectrum S2 '(k) to estimate the residual spectrum S2' (k) Change the dynamic range of. Then, the spectrum modification unit 1088 encodes the modification information indicating how the estimated residual spectrum S2 ′ (k) is modified, and outputs it to the multiplexing unit 1086. In addition, spectrum modification section 1088 outputs the estimated residual spectrum (deformed residual vector) after modification to gain sign section 1085. Note that the internal configuration of the spectrum modification unit 1088 is the same as that of the spectrum modification unit 1087, and a detailed description thereof will be omitted.
- FIG. 29 shows the configuration of second layer decoding section 203 according to Embodiment 10 of the present invention.
- the same components as in Embodiment 6 (FIG. 24) are assigned the same reference numerals and explanations thereof are omitted.
- the modified spectrum generation unit 2037 receives the optimal deformation information j input from the separation unit 2032, that is, the optimal deformation information related to the deformation residual sparing. Based on the report j, the decoded spectrum S ′ (k) input from the filtering unit 2033 is transformed opt
- the modified spectrum generation unit 2037 is provided in correspondence with the spectrum modification unit 1088 on the voice encoding device side.
- FIG. 30 shows the configuration of second layer encoding section 108 according to Embodiment 11 of the present invention.
- FIG. 30 the same components as those in Embodiment 6 (FIG. 22) are denoted by the same reference numerals, and description thereof is omitted.
- spectrum modifying unit 1087 transforms decoded spectrum SI (k) in accordance with predetermined modified information shared with the speech decoding apparatus, thereby decoding decoded spectrum Sl. Change the dynamic range of (k). Then, spectrum modifying section 1087 outputs modified decoded spectrum SI ′ (j, k) to internal state setting section 1081.
- FIG. 31 shows the configuration of second layer decoding section 203 according to Embodiment 11 of the present invention.
- the same components as in Embodiment 6 (FIG. 24) are assigned the same reference numerals and explanations thereof are omitted.
- modified spectrum generating section 2036 is identical to the predetermined modified information shared by speech coding apparatus, that is, the predetermined modified information used by spectrum modifying section 1087 in FIG.
- the first layer decoded spectrum S 1 (k) input from first layer decoding section 202 is modified according to the modified information and output to internal state setting section 2031.
- spectrum modification section 1087 of the speech coding apparatus and modified spectrum generation section 2036 of the speech decoding apparatus perform modification processing in accordance with the same predetermined modification information. Therefore, it is not necessary to transmit deformation information from the speech encoding apparatus to the speech decoding apparatus. Therefore, according to the present embodiment, the bit rate can be reduced as compared with the sixth embodiment.
- the bit rate can be further reduced.
- FIG. 32 shows the configuration of second layer encoding section 108 in this case.
- FIG. 33 shows the configuration of second layer decoding section 203 in this case.
- the second layer coding unit 108 includes Embodiment 2 (Fig. 11), Embodiment 3 (Fig. 13), Embodiment 4 (Fig. 15), It can also be used in Embodiment 5 (FIGS. 17, 15, and 16).
- Embodiments 4 and 5 Figs. 15, 13, 15, and 16
- frequency domain transformation is performed after up-sampling the first layer decoded signal, so that the first layer decoded spectrum Sl (k)
- the frequency band is 0 ⁇ k ⁇ FH.
- the band FL ⁇ k ⁇ FH does not contain valid signal components. Therefore, also in these embodiments, the band of the first layer decoded spectrum S 1 (k) can be handled as 0 ⁇ k ⁇ FL.
- second layer coding section 108 performs coding in the second layer of a speech coding apparatus other than the speech coding apparatuses described in Embodiments 2 to 5. It can also be used.
- the multiplexing unit 109 multiplexes the first layer code key data, the second layer code key data, and the LPC coefficient code key data to generate a bit stream, but is not limited to this, and the second layer code Without the multiplexing unit 1086 in the key unit 108, the pitch coefficient and index Or the like may be directly input to multiplexing section 109 to be multiplexed with first layer code data.
- the second layer code key data generated by being separated from the bitstream by the separating unit 201 is converted into the separating unit 2032 in the second layer decoding unit 203.
- the present invention is not limited to this, and the second layer decoding unit 203 is not provided with the separation unit 2032, and the separation unit 201 directly performs bit separation.
- the stream may be separated into pitch coefficients, indexes, etc. and input to the second layer decoding unit 203.
- the power described by taking the case where the number of layers of the scalable code is 2 as an example is not limited thereto, and the present invention is not limited to this. It can also be applied to.
- the present invention is not limited to this, and the present invention can also be applied to an audio signal.
- the speech coding apparatus and speech decoding apparatus are provided in a radio communication mobile station apparatus and radio communication base station apparatus used in a mobile communication system, and audio in mobile communication is provided. Quality deterioration can be prevented.
- the radio communication mobile station apparatus may be represented as UE
- the radio communication base station apparatus may be represented as Node B.
- Each functional block used in the description of the above embodiment is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them. Here, it may be called IC, system LSI, super LSI, unoretra LSI, depending on the difference in power integration of LSI.
- the method of circuit integration is not limited to LSI, but is a dedicated circuit or general-purpose processor. You may be able to realize it. You can use a field programmable gate array (FPGA) that can be programmed after manufacturing the LSI, or a reconfigurable processor that can reconfigure the connection and settings of the circuit cells inside the LSI.
- FPGA field programmable gate array
- the present invention can be applied to applications such as a radio communication mobile station apparatus and a radio communication base station apparatus used in a mobile communication system.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BRPI0616624-5A BRPI0616624A2 (pt) | 2005-09-30 | 2006-09-29 | aparelho de codificação de fala e método de codificação de fala |
CN2006800353558A CN101273404B (zh) | 2005-09-30 | 2006-09-29 | 语音编码装置以及语音编码方法 |
JP2007537696A JP5089394B2 (ja) | 2005-09-30 | 2006-09-29 | 音声符号化装置および音声符号化方法 |
US12/088,300 US8396717B2 (en) | 2005-09-30 | 2006-09-29 | Speech encoding apparatus and speech encoding method |
EP06810844A EP1926083A4 (en) | 2005-09-30 | 2006-09-29 | AUDIOCODING DEVICE AND AUDIOCODING METHOD |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005-286533 | 2005-09-30 | ||
JP2005286533 | 2005-09-30 | ||
JP2006199616 | 2006-07-21 | ||
JP2006-199616 | 2006-07-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007037361A1 true WO2007037361A1 (ja) | 2007-04-05 |
Family
ID=37899782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/319438 WO2007037361A1 (ja) | 2005-09-30 | 2006-09-29 | 音声符号化装置および音声符号化方法 |
Country Status (8)
Country | Link |
---|---|
US (1) | US8396717B2 (ja) |
EP (1) | EP1926083A4 (ja) |
JP (1) | JP5089394B2 (ja) |
KR (1) | KR20080049085A (ja) |
CN (1) | CN101273404B (ja) |
BR (1) | BRPI0616624A2 (ja) |
RU (1) | RU2008112137A (ja) |
WO (1) | WO2007037361A1 (ja) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010016271A1 (ja) | 2008-08-08 | 2010-02-11 | パナソニック株式会社 | スペクトル平滑化装置、符号化装置、復号装置、通信端末装置、基地局装置及びスペクトル平滑化方法 |
JP2012037582A (ja) * | 2010-08-03 | 2012-02-23 | Sony Corp | 信号処理装置および方法、並びにプログラム |
JP2012083678A (ja) * | 2010-10-15 | 2012-04-26 | Sony Corp | 符号化装置および方法、復号装置および方法、並びにプログラム |
KR20130127552A (ko) * | 2010-07-19 | 2013-11-22 | 돌비 인터네셔널 에이비 | 고주파 복원 동안 오디오 신호들의 프로세싱 |
JP2015092254A (ja) * | 2010-07-19 | 2015-05-14 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | 帯域幅拡張のためのスペクトル平坦性制御 |
US9679580B2 (en) | 2010-04-13 | 2017-06-13 | Sony Corporation | Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program |
US9691410B2 (en) | 2009-10-07 | 2017-06-27 | Sony Corporation | Frequency band extending device and method, encoding device and method, decoding device and method, and program |
US9875746B2 (en) | 2013-09-19 | 2018-01-23 | Sony Corporation | Encoding device and method, decoding device and method, and program |
JP2018077502A (ja) * | 2014-05-01 | 2018-05-17 | 日本電信電話株式会社 | 復号装置、及びその方法、プログラム、記録媒体 |
CN108701467A (zh) * | 2015-12-14 | 2018-10-23 | 弗劳恩霍夫应用研究促进协会 | 处理经编码音频信号的装置及方法 |
US10692511B2 (en) | 2013-12-27 | 2020-06-23 | Sony Corporation | Decoding apparatus and method, and program |
US12002476B2 (en) | 2010-07-19 | 2024-06-04 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005111568A1 (ja) * | 2004-05-14 | 2005-11-24 | Matsushita Electric Industrial Co., Ltd. | 符号化装置、復号化装置、およびこれらの方法 |
WO2006006366A1 (ja) * | 2004-07-13 | 2006-01-19 | Matsushita Electric Industrial Co., Ltd. | ピッチ周波数推定装置およびピッチ周波数推定方法 |
WO2008066071A1 (en) * | 2006-11-29 | 2008-06-05 | Panasonic Corporation | Decoding apparatus and audio decoding method |
WO2008084688A1 (ja) * | 2006-12-27 | 2008-07-17 | Panasonic Corporation | 符号化装置、復号装置及びこれらの方法 |
WO2009084221A1 (ja) * | 2007-12-27 | 2009-07-09 | Panasonic Corporation | 符号化装置、復号装置およびこれらの方法 |
ES2396927T3 (es) * | 2008-07-11 | 2013-03-01 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Aparato y procedimiento para decodificar una señal de audio codificada |
MX2011000367A (es) * | 2008-07-11 | 2011-03-02 | Fraunhofer Ges Forschung | Un aparato y un metodo para calcular una cantidad de envolventes espectrales. |
CN101741504B (zh) * | 2008-11-24 | 2013-06-12 | 华为技术有限公司 | 一种确定信号线性预测编码阶数的方法和装置 |
JP5423684B2 (ja) * | 2008-12-19 | 2014-02-19 | 富士通株式会社 | 音声帯域拡張装置及び音声帯域拡張方法 |
EP2402940B9 (en) * | 2009-02-26 | 2019-10-30 | Panasonic Intellectual Property Corporation of America | Encoder, decoder, and method therefor |
US8924220B2 (en) * | 2009-10-20 | 2014-12-30 | Lenovo Innovations Limited (Hong Kong) | Multiband compressor |
JP5609737B2 (ja) | 2010-04-13 | 2014-10-22 | ソニー株式会社 | 信号処理装置および方法、符号化装置および方法、復号装置および方法、並びにプログラム |
RU2013110317A (ru) | 2010-09-10 | 2014-10-20 | Панасоник Корпорэйшн | Кодирующее устройство и способ кодирования |
US20130173275A1 (en) * | 2010-10-18 | 2013-07-04 | Panasonic Corporation | Audio encoding device and audio decoding device |
JP5664291B2 (ja) * | 2011-02-01 | 2015-02-04 | 沖電気工業株式会社 | 音声品質観測装置、方法及びプログラム |
JP5817499B2 (ja) * | 2011-12-15 | 2015-11-18 | 富士通株式会社 | 復号装置、符号化装置、符号化復号システム、復号方法、符号化方法、復号プログラム、及び符号化プログラム |
JP6082703B2 (ja) * | 2012-01-20 | 2017-02-15 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 音声復号装置及び音声復号方法 |
EP2757558A1 (en) * | 2013-01-18 | 2014-07-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Time domain level adjustment for audio signal decoding or encoding |
US9711156B2 (en) * | 2013-02-08 | 2017-07-18 | Qualcomm Incorporated | Systems and methods of performing filtering for gain determination |
EP3671738B1 (en) * | 2013-04-05 | 2024-06-05 | Dolby International AB | Audio encoder and decoder |
JP6305694B2 (ja) * | 2013-05-31 | 2018-04-04 | クラリオン株式会社 | 信号処理装置及び信号処理方法 |
CN108198564B (zh) * | 2013-07-01 | 2021-02-26 | 华为技术有限公司 | 信号编码和解码方法以及设备 |
US9666202B2 (en) | 2013-09-10 | 2017-05-30 | Huawei Technologies Co., Ltd. | Adaptive bandwidth extension and apparatus for the same |
KR101782278B1 (ko) * | 2013-10-18 | 2017-10-23 | 텔레폰악티에볼라겟엘엠에릭슨(펍) | 스펙트럼의 피크 위치의 코딩 및 디코딩 |
CN111312278B (zh) * | 2014-03-03 | 2023-08-15 | 三星电子株式会社 | 用于带宽扩展的高频解码的方法及设备 |
ES2843300T3 (es) * | 2014-05-01 | 2021-07-16 | Nippon Telegraph & Telephone | Codificación de una señal de sonido |
ES2883848T3 (es) * | 2014-05-01 | 2021-12-09 | Nippon Telegraph & Telephone | Codificador, descodificador, método de codificación, método de descodificación, programa de codificación, programa de descodificación y soporte de registro |
US9838700B2 (en) * | 2014-11-27 | 2017-12-05 | Nippon Telegraph And Telephone Corporation | Encoding apparatus, decoding apparatus, and method and program for the same |
EP3382704A1 (en) * | 2017-03-31 | 2018-10-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal |
US10825467B2 (en) * | 2017-04-21 | 2020-11-03 | Qualcomm Incorporated | Non-harmonic speech detection and bandwidth extension in a multi-source environment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001521648A (ja) | 1997-06-10 | 2001-11-06 | コーディング テクノロジーズ スウェーデン アクチボラゲット | スペクトル帯域複製を用いた原始コーディングの強化 |
JP2004514179A (ja) * | 2000-11-14 | 2004-05-13 | コーディング テクノロジーズ アクチボラゲット | 適応ろ波による高周波復元符号化方法の知覚性能の強化方法 |
JP2005062410A (ja) * | 2003-08-11 | 2005-03-10 | Nippon Telegr & Teleph Corp <Ntt> | 音声信号の符号化方法 |
JP2005286533A (ja) | 2004-03-29 | 2005-10-13 | Nippon Hoso Kyokai <Nhk> | データ伝送システム、データ送信装置、データ受信装置 |
JP2006199616A (ja) | 2005-01-20 | 2006-08-03 | Shiseido Co Ltd | 粉末化粧料の成型方法及び装置及び製品 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3283413B2 (ja) | 1995-11-30 | 2002-05-20 | 株式会社日立製作所 | 符号化復号方法、符号化装置および復号装置 |
SE9903553D0 (sv) * | 1999-01-27 | 1999-10-01 | Lars Liljeryd | Enhancing percepptual performance of SBR and related coding methods by adaptive noise addition (ANA) and noise substitution limiting (NSL) |
SE0001926D0 (sv) * | 2000-05-23 | 2000-05-23 | Lars Liljeryd | Improved spectral translation/folding in the subband domain |
US7469206B2 (en) | 2001-11-29 | 2008-12-23 | Coding Technologies Ab | Methods for improving high frequency reconstruction |
AU2003213439A1 (en) * | 2002-03-08 | 2003-09-22 | Nippon Telegraph And Telephone Corporation | Digital signal encoding method, decoding method, encoding device, decoding device, digital signal encoding program, and decoding program |
JP2004062410A (ja) | 2002-07-26 | 2004-02-26 | Nippon Seiki Co Ltd | 表示装置の表示方法 |
JP3861770B2 (ja) * | 2002-08-21 | 2006-12-20 | ソニー株式会社 | 信号符号化装置及び方法、信号復号装置及び方法、並びにプログラム及び記録媒体 |
WO2006025313A1 (ja) | 2004-08-31 | 2006-03-09 | Matsushita Electric Industrial Co., Ltd. | 音声符号化装置、音声復号化装置、通信装置及び音声符号化方法 |
WO2006046546A1 (ja) | 2004-10-26 | 2006-05-04 | Matsushita Electric Industrial Co., Ltd. | 音声符号化装置および音声符号化方法 |
CN101044552A (zh) | 2004-10-27 | 2007-09-26 | 松下电器产业株式会社 | 语音编码装置和语音编码方法 |
EP1798724B1 (en) | 2004-11-05 | 2014-06-18 | Panasonic Corporation | Encoder, decoder, encoding method, and decoding method |
EP2138999A1 (en) | 2004-12-28 | 2009-12-30 | Panasonic Corporation | Audio encoding device and audio encoding method |
EP1864281A1 (en) * | 2005-04-01 | 2007-12-12 | QUALCOMM Incorporated | Systems, methods, and apparatus for highband burst suppression |
ATE421845T1 (de) * | 2005-04-15 | 2009-02-15 | Dolby Sweden Ab | Zeitliche hüllkurvenformgebung von entkorrelierten signalen |
-
2006
- 2006-09-29 US US12/088,300 patent/US8396717B2/en active Active
- 2006-09-29 CN CN2006800353558A patent/CN101273404B/zh not_active Expired - Fee Related
- 2006-09-29 BR BRPI0616624-5A patent/BRPI0616624A2/pt not_active Application Discontinuation
- 2006-09-29 RU RU2008112137/09A patent/RU2008112137A/ru not_active Application Discontinuation
- 2006-09-29 JP JP2007537696A patent/JP5089394B2/ja not_active Expired - Fee Related
- 2006-09-29 KR KR1020087007649A patent/KR20080049085A/ko not_active Application Discontinuation
- 2006-09-29 WO PCT/JP2006/319438 patent/WO2007037361A1/ja active Application Filing
- 2006-09-29 EP EP06810844A patent/EP1926083A4/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001521648A (ja) | 1997-06-10 | 2001-11-06 | コーディング テクノロジーズ スウェーデン アクチボラゲット | スペクトル帯域複製を用いた原始コーディングの強化 |
JP2004514179A (ja) * | 2000-11-14 | 2004-05-13 | コーディング テクノロジーズ アクチボラゲット | 適応ろ波による高周波復元符号化方法の知覚性能の強化方法 |
JP2005062410A (ja) * | 2003-08-11 | 2005-03-10 | Nippon Telegr & Teleph Corp <Ntt> | 音声信号の符号化方法 |
JP2005286533A (ja) | 2004-03-29 | 2005-10-13 | Nippon Hoso Kyokai <Nhk> | データ伝送システム、データ送信装置、データ受信装置 |
JP2006199616A (ja) | 2005-01-20 | 2006-08-03 | Shiseido Co Ltd | 粉末化粧料の成型方法及び装置及び製品 |
Non-Patent Citations (3)
Title |
---|
"Everything about MPEG-4", 30 September 1998, KOGYO CHOSAKAI PUBLISHING, INC., pages: 126 - 127 |
OSHIKIRI M. ET AL.: "Pitch Filtering ni Motozuku Spectrum Fugoka o Mochiita Chokotaiiki Scalable Onsei Fugoka no Kaizen", THE ACOUSTICAL SOCIETY OF JAPAN (ASJ), 21 September 2004 (2004-09-21), pages 297 - 298, XP002994276 * |
See also references of EP1926083A4 |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010016271A1 (ja) | 2008-08-08 | 2010-02-11 | パナソニック株式会社 | スペクトル平滑化装置、符号化装置、復号装置、通信端末装置、基地局装置及びスペクトル平滑化方法 |
US8731909B2 (en) | 2008-08-08 | 2014-05-20 | Panasonic Corporation | Spectral smoothing device, encoding device, decoding device, communication terminal device, base station device, and spectral smoothing method |
US9691410B2 (en) | 2009-10-07 | 2017-06-27 | Sony Corporation | Frequency band extending device and method, encoding device and method, decoding device and method, and program |
US9679580B2 (en) | 2010-04-13 | 2017-06-13 | Sony Corporation | Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program |
US10546594B2 (en) | 2010-04-13 | 2020-01-28 | Sony Corporation | Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program |
US10381018B2 (en) | 2010-04-13 | 2019-08-13 | Sony Corporation | Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program |
US10297270B2 (en) | 2010-04-13 | 2019-05-21 | Sony Corporation | Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program |
US10224054B2 (en) | 2010-04-13 | 2019-03-05 | Sony Corporation | Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program |
US10339938B2 (en) | 2010-07-19 | 2019-07-02 | Huawei Technologies Co., Ltd. | Spectrum flatness control for bandwidth extension |
KR20190034361A (ko) * | 2010-07-19 | 2019-04-01 | 돌비 인터네셔널 에이비 | 고주파 복원 동안 오디오 신호들의 프로세싱 |
KR101709095B1 (ko) | 2010-07-19 | 2017-03-08 | 돌비 인터네셔널 에이비 | 고주파 복원 동안 오디오 신호들의 프로세싱 |
US9640184B2 (en) | 2010-07-19 | 2017-05-02 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
US12002476B2 (en) | 2010-07-19 | 2024-06-04 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
US11568880B2 (en) | 2010-07-19 | 2023-01-31 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
US11031019B2 (en) | 2010-07-19 | 2021-06-08 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
KR102026677B1 (ko) | 2010-07-19 | 2019-09-30 | 돌비 인터네셔널 에이비 | 고주파 복원 동안 오디오 신호들의 프로세싱 |
KR101803849B1 (ko) | 2010-07-19 | 2017-12-04 | 돌비 인터네셔널 에이비 | 고주파 복원 동안 오디오 신호들의 프로세싱 |
KR20130127552A (ko) * | 2010-07-19 | 2013-11-22 | 돌비 인터네셔널 에이비 | 고주파 복원 동안 오디오 신호들의 프로세싱 |
US9911431B2 (en) | 2010-07-19 | 2018-03-06 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
JP2015092254A (ja) * | 2010-07-19 | 2015-05-14 | ホアウェイ・テクノロジーズ・カンパニー・リミテッド | 帯域幅拡張のためのスペクトル平坦性制御 |
US10283122B2 (en) | 2010-07-19 | 2019-05-07 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
JP2015111277A (ja) * | 2010-07-19 | 2015-06-18 | ドルビー・インターナショナル・アーベー | 高周波再構成の際のオーディオ信号処理 |
JP2012037582A (ja) * | 2010-08-03 | 2012-02-23 | Sony Corp | 信号処理装置および方法、並びにプログラム |
US10229690B2 (en) | 2010-08-03 | 2019-03-12 | Sony Corporation | Signal processing apparatus and method, and program |
US9406306B2 (en) | 2010-08-03 | 2016-08-02 | Sony Corporation | Signal processing apparatus and method, and program |
US9767814B2 (en) | 2010-08-03 | 2017-09-19 | Sony Corporation | Signal processing apparatus and method, and program |
US11011179B2 (en) | 2010-08-03 | 2021-05-18 | Sony Corporation | Signal processing apparatus and method, and program |
US9767824B2 (en) | 2010-10-15 | 2017-09-19 | Sony Corporation | Encoding device and method, decoding device and method, and program |
US9177563B2 (en) | 2010-10-15 | 2015-11-03 | Sony Corporation | Encoding device and method, decoding device and method, and program |
JP2012083678A (ja) * | 2010-10-15 | 2012-04-26 | Sony Corp | 符号化装置および方法、復号装置および方法、並びにプログラム |
US9536542B2 (en) | 2010-10-15 | 2017-01-03 | Sony Corporation | Encoding device and method, decoding device and method, and program |
US10236015B2 (en) | 2010-10-15 | 2019-03-19 | Sony Corporation | Encoding device and method, decoding device and method, and program |
US9875746B2 (en) | 2013-09-19 | 2018-01-23 | Sony Corporation | Encoding device and method, decoding device and method, and program |
US10692511B2 (en) | 2013-12-27 | 2020-06-23 | Sony Corporation | Decoding apparatus and method, and program |
US11705140B2 (en) | 2013-12-27 | 2023-07-18 | Sony Corporation | Decoding apparatus and method, and program |
JP2018077502A (ja) * | 2014-05-01 | 2018-05-17 | 日本電信電話株式会社 | 復号装置、及びその方法、プログラム、記録媒体 |
CN108701467B (zh) * | 2015-12-14 | 2023-12-08 | 弗劳恩霍夫应用研究促进协会 | 处理经编码音频信号的装置及方法 |
US11862184B2 (en) | 2015-12-14 | 2024-01-02 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for processing an encoded audio signal by upsampling a core audio signal to upsampled spectra with higher frequencies and spectral width |
CN108701467A (zh) * | 2015-12-14 | 2018-10-23 | 弗劳恩霍夫应用研究促进协会 | 处理经编码音频信号的装置及方法 |
Also Published As
Publication number | Publication date |
---|---|
US8396717B2 (en) | 2013-03-12 |
RU2008112137A (ru) | 2009-11-10 |
US20090157413A1 (en) | 2009-06-18 |
JP5089394B2 (ja) | 2012-12-05 |
JPWO2007037361A1 (ja) | 2009-04-16 |
EP1926083A1 (en) | 2008-05-28 |
CN101273404A (zh) | 2008-09-24 |
KR20080049085A (ko) | 2008-06-03 |
CN101273404B (zh) | 2012-07-04 |
BRPI0616624A2 (pt) | 2011-06-28 |
EP1926083A4 (en) | 2011-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5089394B2 (ja) | 音声符号化装置および音声符号化方法 | |
US8315863B2 (en) | Post filter, decoder, and post filtering method | |
JP5173800B2 (ja) | 音声符号化装置、音声復号化装置、およびこれらの方法 | |
JP6371812B2 (ja) | 符号化装置および符号化方法 | |
JP4977471B2 (ja) | 符号化装置及び符号化方法 | |
JP5371931B2 (ja) | 符号化装置、復号化装置、およびこれらの方法 | |
JP4859670B2 (ja) | 音声符号化装置および音声符号化方法 | |
EP1806736B1 (en) | Scalable encoding apparatus, scalable decoding apparatus, and methods thereof | |
TWI576832B (zh) | 產生帶寬延伸訊號的裝置與方法 | |
JP6980871B2 (ja) | 信号符号化方法及びその装置、並びに信号復号方法及びその装置 | |
US20070156397A1 (en) | Coding equipment | |
WO2009081568A1 (ja) | 符号化装置、復号装置および符号化方法 | |
JP4976381B2 (ja) | 音声符号化装置、音声復号化装置、およびこれらの方法 | |
JPWO2008072737A1 (ja) | 符号化装置、復号装置およびこれらの方法 | |
JP5602769B2 (ja) | 符号化装置、復号装置、符号化方法及び復号方法 | |
WO2006041055A1 (ja) | スケーラブル符号化装置、スケーラブル復号装置及びスケーラブル符号化方法 | |
US20100017199A1 (en) | Encoding device, decoding device, and method thereof | |
JP4354561B2 (ja) | オーディオ信号符号化装置及び復号化装置 | |
RU2809981C1 (ru) | Аудиодекодер, аудиокодер и связанные способы с использованием объединенного кодирования параметров масштабирования для каналов многоканального аудиосигнала |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200680035355.8 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2007537696 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12088300 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020087007649 Country of ref document: KR Ref document number: 2006810844 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 592/MUMNP/2008 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008112137 Country of ref document: RU |
|
ENP | Entry into the national phase |
Ref document number: PI0616624 Country of ref document: BR Kind code of ref document: A2 Effective date: 20080331 |