WO2016017238A1 - 符号化方法、装置、プログラム及び記録媒体 - Google Patents
符号化方法、装置、プログラム及び記録媒体 Download PDFInfo
- Publication number
- WO2016017238A1 WO2016017238A1 PCT/JP2015/063989 JP2015063989W WO2016017238A1 WO 2016017238 A1 WO2016017238 A1 WO 2016017238A1 JP 2015063989 W JP2015063989 W JP 2015063989W WO 2016017238 A1 WO2016017238 A1 WO 2016017238A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- encoding
- current frame
- frequency domain
- acoustic signal
- encoding process
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 479
- 238000001228 spectrum Methods 0.000 claims description 87
- 230000005236 sound signal Effects 0.000 claims description 63
- 230000003595 spectral effect Effects 0.000 claims description 31
- 230000002045 lasting effect Effects 0.000 abstract 1
- 230000003044 adaptive effect Effects 0.000 description 66
- 238000010606 normalization Methods 0.000 description 20
- 238000006243 chemical reaction Methods 0.000 description 16
- 238000013139 quantization Methods 0.000 description 14
- 230000006978 adaptation Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0017—Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0016—Codebook for LPC parameters
Definitions
- the present invention relates to an audio signal encoding technique.
- the present invention relates to an encoding technique for converting and encoding an acoustic signal into a frequency domain.
- Non-Patent Document 1 For encoding audio signals such as voice and music, a method of encoding an input audio signal in the frequency domain is widely used.
- a method of encoding an acoustic signal in the frequency domain for example, there are methods described in Non-Patent Document 1 and Non-Patent Document 2.
- the encoding method described in Non-Patent Document 1 performs an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient.
- the encoding method described in Non-Patent Document 1 encodes a coefficient that can be converted into a linear prediction coefficient obtained from an input acoustic signal to obtain a linear prediction coefficient code, and corresponds to the linear prediction coefficient code.
- the normalized coefficient sequence is obtained by encoding the normalized coefficient sequence obtained by normalizing the frequency domain coefficient sequence corresponding to the input acoustic signal with the spectrum envelope coefficient sequence corresponding to the coefficient that can be converted into the quantized linear prediction coefficient. To get.
- the coefficient that can be converted into the linear prediction coefficient is the linear prediction coefficient itself, a PARCOR coefficient (partial autocorrelation coefficient), or an LSP parameter.
- the encoding method described in Non-Patent Document 2 takes the difference between the logarithmic value of the average energy of the coefficient for each divided frequency domain and the logarithmic value of the average energy of the adjacent frequency domain, and performs differential value variable length encoding.
- the accompanying encoding process is performed.
- the encoding method described in Non-Patent Document 2 divides a frequency domain coefficient sequence corresponding to an input acoustic signal into frequency domains with a lower number of samples and a higher number of samples. The average energy is obtained for each frequency region, and the average energy is quantized on the logarithmic axis.
- the difference between the quantized value and the value obtained by quantizing the average energy in the adjacent frequency domain in the same manner on the logarithmic axis is variable-length encoded.
- the average energy quantized on the logarithmic axis for each divided frequency domain is used to adaptively determine the number of quantization bits for each frequency domain coefficient and the quantization step width for each frequency domain coefficient, and each frequency domain accordingly
- the coefficient is quantized and further variable-length encoded.
- Non-Patent Document 2 when the undulation of the spectrum envelope of the input acoustic signal is not large or the concentration of the spectrum is not high, the average energy difference is encoded by variable length encoding. Since the code amount of the code can be reduced, the input acoustic signal can be encoded efficiently. However, when the undulation of the spectral envelope of the input acoustic signal is large or when the concentration of the spectrum is high, the amount of code of the average energy code obtained by variable-length coding the difference in average energy increases.
- the encoding method of Non-Patent Document 1 since the spectrum envelope can be efficiently encoded by the coefficient that can be converted into the linear prediction coefficient, the case where the undulation of the spectrum envelope of the input acoustic signal is large or the concentration of the spectrum When the degree is high, the input acoustic signal can be encoded more efficiently than the encoding method of Non-Patent Document 2. However, when the undulation of the spectrum envelope of the input acoustic signal is not large or when the degree of spectrum concentration is not high, it cannot be encoded as efficiently as the encoding method of Non-Patent Document 2.
- the conventional encoding method may not be able to be encoded efficiently depending on the characteristics of the input acoustic signal.
- the present invention provides an encoding method, apparatus, program, and recording medium that can be efficiently encoded regardless of the characteristics of the input acoustic signal and that can obtain a decoded acoustic signal that is less likely to be unnatural to the listener.
- the purpose is to do.
- An encoding method is an encoding method for encoding an input acoustic signal by a determined encoding process among a plurality of encoding processes in a frequency domain for each frame of a predetermined time interval.
- the difference from the previous frame A determination step is included that allows the encoding process to be determined as the encoding process for the current frame.
- An encoding method is an encoding method for encoding an input acoustic signal by a determined encoding process among a plurality of encoding processes in a frequency domain for each frame of a predetermined time interval.
- a predetermined threshold When at least one of the magnitude of the energy of the high frequency component of the input acoustic signal of the previous frame and the magnitude of the energy of the high frequency component of the input acoustic signal of the current frame is equal to or less than a predetermined threshold, It is possible to determine a different encoding process as the encoding process of the current frame, otherwise, an encoding process different from the previous frame is performed according to a state in which the high frequency component of the input acoustic signal is sparse.
- a determination step is included for determining whether the encoding process can be determined or whether the same encoding process as that of the previous frame is determined as the encoding process of the current frame.
- An encoding method is an encoding method for encoding an input acoustic signal by a determined encoding process among a plurality of encoding processes in a frequency domain for each frame of a predetermined time interval.
- Second, the logarithm of the average energy of the coefficient for each divided frequency domain is encoded with variable length encoding of the difference from the logarithm of the average energy of the adjacent frequency domain.
- a decoded acoustic signal that is less likely to be felt unnatural by the listener can be obtained with a configuration in which any one of a plurality of encoding processes that perform encoding in the frequency domain for each frame can be selected.
- the block diagram which illustrated the composition of the decoding device. The figure which shows the example of the flow of a process of an encoding method.
- the first embodiment is configured to encode a frequency domain coefficient sequence corresponding to an input acoustic signal for each frame in any of a plurality of different encoding processes that perform encoding processing in the frequency domain.
- the encoding process is switched only when the energy of the high frequency component of the frequency domain coefficient sequence corresponding to the input acoustic signal is small.
- the energy of the high frequency component of the input acoustic signal includes the magnitude of the energy of the high frequency component of the input acoustic signal itself, the magnitude of the energy of the high frequency component occupying the input acoustic signal, and the like.
- the configuration of the encoding apparatus 300 is shown in FIG.
- the encoding apparatus 300 includes a frequency domain conversion unit 110, a determination unit 380, a first encoding unit 101, and a second encoding unit 201.
- the first encoding unit 101 includes, for example, a linear prediction analysis encoding unit 120, a spectrum envelope coefficient sequence generation unit 130, an envelope normalization unit 140, and a normalization coefficient encoding unit 150.
- the second encoding unit 201 includes, for example, an area dividing unit 220, an average log energy difference variable length encoding unit 240, and a coefficient encoding unit 250.
- the encoding apparatus 300 receives a time-domain audio-acoustic digital signal (hereinafter referred to as an input acoustic signal) in units of frames that are predetermined time intervals, and performs the following processing for each frame.
- an input acoustic signal a time-domain audio-acoustic digital signal
- Nt the number of samples per frame.
- the encoding apparatus 300 performs processing of each step of the encoding method illustrated in FIG.
- N is the number of samples in the frequency domain, and is a positive integer.
- the conversion to the frequency domain may be performed by a known conversion method that is not MDCT.
- the frequency domain conversion unit 110 uses a plurality of frequency domain coefficient sequences. What is necessary is just to obtain the coefficient sequence of a frequency domain with the precision and method of this. For example, when the first encoding unit 101 and the second encoding unit 201 use the MDCT coefficient sequence as the frequency domain coefficient sequence and the determination unit 380 uses the power spectrum sequence as the frequency domain coefficient sequence, the frequency domain conversion is performed.
- the unit 110 may obtain the MDCT coefficient sequence and the power spectrum sequence from the input acoustic signal.
- the determination unit 380 uses an energy sequence for each frequency band as a frequency domain coefficient sequence.
- the frequency domain conversion unit 110 may obtain an MDCT coefficient sequence and an energy sequence for each frequency band from the input acoustic signal.
- the first encoding unit 101 and the second encoding unit 201 use the MDCT coefficient sequence as a frequency domain coefficient sequence, and the switchability determination unit 381 of the determination unit 380 converts the energy sequence for each frequency band into the frequency domain.
- the frequency domain conversion unit 110 converts the MDCT coefficient sequence and the frequency band from the input acoustic signal. What is necessary is just to obtain
- ⁇ Determining unit 380> When at least one of the magnitude of the energy of the high frequency component of the input acoustic signal of the previous frame and the magnitude of the energy of the high frequency component of the input acoustic signal of the current frame is smaller than a predetermined threshold, the determination unit 380 An encoding process different from the frame can be determined as the encoding process of the current frame (step S380).
- the determination unit 380 performs the encoding process different from the encoding process in which the coefficient sequence of the frequency domain of the previous frame is encoded at least when the energy of the high frequency component of the input acoustic signal is small. Allow encoding of the frequency domain coefficient sequence, otherwise encode the frequency domain coefficient sequence of the current frame with an encoding process different from the encoding process of encoding the frequency domain coefficient sequence of the previous frame. The determination is not allowed to be performed, and switching control is performed so that the coefficient sequence in the frequency domain of the current frame is encoded according to the determination result.
- the determination unit 380 includes, for example, a switchability determination unit 381, a compatible encoding process determination unit 382, a switch determination unit 383, and a switching unit 384. Hereinafter, an example of the determination unit 380 will be described.
- the determination unit 380 performs processing of each step illustrated in FIG.
- the switchability determination unit 381 determines that at least one of the magnitude of the high-frequency component energy of the input acoustic signal of the previous frame and the magnitude of the high-frequency component energy of the input acoustic signal of the current frame is smaller than a predetermined threshold. Is determined to be switchable, that is, the frequency sequence coefficient sequence of the current frame can be encoded by an encoding process different from the encoding process of encoding the frequency domain coefficient sequence of the previous frame, Otherwise, it is determined that switching is not possible, that is, it is not allowed to encode the frequency domain coefficient sequence of the current frame by an encoding process different from the encoding process encoding the frequency domain coefficient sequence of the previous frame. The determination result is output (step S381).
- M is a predetermined positive integer smaller than N.
- Switching determination unit 381 then, if the threshold TH1 is less than at least one has predetermined energy Eh f of high-frequency energy Eh f-1 and the current frame high-frequency of the previous frame, i.e., Eh f- If 1 ⁇ TH1 and / or Eh f ⁇ TH1, it is determined that switching is possible, otherwise it is determined that switching is impossible, and information indicating whether switching is possible is output (step S3812).
- the high-frequency energy Eh f ⁇ 1 of the previous frame obtained in step S3811 of the current frame is the same as the high-frequency energy Eh f of the current frame obtained in step S3811 of the previous frame. For this reason, if the calculated high frequency energy Eh f is stored in the switchability determination unit 381 at least until the immediately following frame, it is not necessary to calculate the high frequency energy ratio Eh f ⁇ 1 of the previous frame.
- the threshold value at least either one of a predetermined energy percentage Eh f of the high frequency to the total energy of the energy ratio Eh f-1 and current frame of the high frequency to the total energy of the previous frame If it is smaller than TH1, that is, if Eh f-1 ⁇ TH1 and / or Eh f ⁇ TH1, it is determined that switching is possible, otherwise it is determined that switching is impossible and information on whether switching is possible is output. (Step S3812).
- the ratio Eh f-1 energy of the high band to the total energy of the previous frame obtained in step S3811 of the current frame the energy ratio Eh f of the high frequency to the total energy of the current frame obtained in step S3811 the previous frame The same. Therefore, if the ratio Eh f of the high frequency to the calculated total energy is stored in the switchability determination unit 381 at least until the immediately following frame, the ratio Eh f ⁇ of the high frequency to the total energy of the previous frame is stored. There is no need to calculate 1 .
- the MDCT coefficient sequence is used to calculate the high-frequency energy and the ratio of the high-frequency energy to the total energy.
- the high-frequency energy is calculated using the power spectrum sequence and the energy sequence for each frequency band. You may obtain
- the adaptive encoding process determination unit 382 has a frequency domain coefficient sequence corresponding to the input sound signal of the current frame suitable for either the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201. And the determination result is output (step S382).
- the adaptive encoding process determination unit 382 performs the process of each step illustrated in FIG.
- the encoding process of the first encoding unit 101 is an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is encoded by the first encoding unit 101.
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is the second encoding unit 201. Is determined to be suitable for the encoding process, and the determination result is output.
- any method may be adopted as a method for estimating the undulation and concentration of the spectrum.
- a configuration for estimating the depth of the spectrum or the envelope valley will be described.
- the noise floor is high when the valley of the spectrum or its envelope is shallow.
- a noise floor is low that the valley of a spectrum or its envelope is deep.
- the frequency domain coefficient sequence for example, the power spectrum sequence, may be the target of processing by the adaptive encoding processing determination unit 382.
- the average value of power is AVE XS (q) obtained by equation (3).
- the logarithm value of the average value of power is AVE XS (q) obtained by the equation (3A).
- the adaptive coding processing determination unit 382 includes each element of the series AVE XS (1), AVE XS (2), ..., AVE XS (Q) based on the average value of power or the logarithmic value of the average value of power. Is determined to be smaller than any of the two adjacent elements, and the determined number of elements is obtained (step S3823). That is, the number Vally of q satisfying the equation (4) is obtained.
- Adapted coding scheme judgment unit 382 then, the average value E V of the formula (4) corresponding to q satisfying Vally number of AVE XS (q), i.e., the average value E V trough subregion ( Step S3824).
- the adaptive encoding process determination unit 382 also obtains the average power value of all the partial regions or the logarithmic value of the average power value (step S3825).
- Adapted coding scheme judgment unit 382 then, all the partial regions of the AVE XS (q) the average value E and the valley of the subregion AVE XS (q) of the difference between the average value E V is less than a predetermined threshold value TH2 of In this case, it is estimated that the spectrum valley is shallow and the spectrum envelope has little undulation or low concentration, so that the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is the second. It is determined that it matches the encoding process of the encoding unit 201.
- the adaptive encoding process determination unit 382 outputs information indicating which encoding process is appropriate (step S3826).
- the information on the suitable encoding process is also referred to as conformance information.
- the number of samples may be different for each partial coefficient sequence.
- P 1 , P 2 ,..., P Q preferably satisfy P 1 ⁇ P 2 ⁇ ... ⁇ P Q.
- Q is a positive integer.
- the switching determination unit 383 calculates the frequency domain of the current frame from the information on whether the switching is possible or not obtained by the switching possibility judgment unit 381 and the information on which encoding process obtained by the adaptive encoding process determination unit 382 is suitable. Is determined by the first encoding unit 101 or the second encoding unit 201, and a switching code that is a code that can identify the determined encoding process is output (step S383). ). The output switching code is input to the decoding device 400.
- the switching determination unit 383 uses the same encoding process as that of the previous frame, regardless of which encoding process the current frame is suitable for. Are determined to be encoded.
- the coefficient sequence in the frequency domain of the current frame is encoded by an encoding process that matches the current frame, regardless of the encoding process of the previous frame. To decide. However, even when switching is possible, there is a case where it is determined that the coefficient sequence in the frequency domain of the current frame is encoded by the same encoding process as the previous frame, not the encoding process in which the current frame is suitable. Also good.
- the encoding process of the first encoding unit 101 is an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1, and the second encoding unit Coding with variable-length coding of the difference between the logarithmic value of the average energy of the adjacent frequency domain and the logarithmic value of the average energy of the adjacent frequency domain in the logarithm of the average energy of the coefficients for each divided frequency domain, as exemplified in Non-Patent Document 2 It is processing.
- the switching determination unit 383 is information indicating whether or not the switchability information obtained by the switchability determination unit 381 indicates that switching is impossible and / or which encoding process obtained by the adaptive encoding process determination unit 382 is suitable.
- the switchability information obtained by the switchability determination unit 381 indicates that switchability is possible, and the information on which encoding process obtained by the adaptive encoding process determination unit 382 is suitable is the MDCT coefficient sequence X of the previous frame.
- the information indicating whether or not switching is possible indicates that switching is possible and the information on which encoding process obtained by the adaptive encoding process determination unit 382 is compatible indicates the encoding process of the first encoding unit 101
- the encoding process of the first encoding unit 101 is an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1, and the code of the second encoding unit 201
- the encoding process is an encoding process involving variable length encoding of the difference between the average energy of the coefficients for each divided frequency region exemplified in Non-Patent Document 2 and the average energy of the adjacent frequency region
- Both the first encoding unit 101 and the second encoding unit 201 perform an encoding process for encoding a frequency domain coefficient sequence, but the encoding processes to be performed are different from each other. That is, the first encoding unit 101 encodes the coefficient sequence in the frequency domain of the current frame by an encoding process different from that of the second encoding unit 201, and outputs the first code that is the obtained code (step S101). ). The second encoding unit 201 encodes the coefficient sequence in the frequency domain of the current frame by an encoding process different from that of the first encoding unit 101, and outputs a second code that is the obtained code (step S201). ).
- the first encoding unit 101 performs an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient
- the second encoding unit 201 uses an average energy of a coefficient for each divided frequency domain. Perform the encoding process.
- the encoding process of the first encoding unit 101 is an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1
- the encoding process 201 is an encoding process involving variable length encoding of the difference between the average energy of the coefficients for each divided frequency domain exemplified in Non-Patent Document 2 and the average energy of the adjacent frequency domain.
- the first encoding process by the first encoding unit 101 expresses the spectrum envelope shape in the frequency domain with a coefficient that can be converted into a linear prediction coefficient as illustrated on the left in FIG.
- the second encoding process by the second encoding unit 201 represents an envelope shape with a scale factor band (a division into a plurality of regions of a frequency domain coefficient sequence) as illustrated on the right in FIG. It is. According to the second encoding process, since variable length encoding of the difference value of the average height of each region is used, it can be said that the efficiency is very good when the average value changes smoothly.
- one of the processing of the first encoding unit 101 and the processing of the second encoding unit 201 which is a plurality of encoding processes in the frequency domain, is performed.
- the first encoding unit 101 includes a linear prediction analysis encoding unit 120, a spectrum envelope coefficient sequence generation unit 130, an envelope normalization unit 140, and a normalization coefficient encoding unit 150.
- the output first code is input to the decoding device 400.
- the first encoding unit 101 is obtained by excluding a part that converts an input acoustic signal into a frequency domain coefficient sequence from the encoding process described in Non-Patent Document 1. That is, the encoding process performed by the frequency domain transform unit 110 and the first encoding unit 101 is the same as the encoding process described in Non-Patent Document 1.
- the coefficients are encoded to obtain and output a linear prediction coefficient code CL f and a coefficient that can be converted into a quantized linear prediction coefficient corresponding to the linear prediction coefficient code CL f (step S120).
- the coefficient that can be converted into the linear prediction coefficient is the linear prediction coefficient itself, a PARCOR coefficient (partial autocorrelation coefficient), or an LSP parameter.
- the second encoding unit 201 includes an area dividing unit 220, an average log energy difference variable length encoding unit 240, and a coefficient encoding unit 250.
- the output second code is input to the decoding device 400.
- the second encoding unit 201 is obtained by excluding a part for converting an input acoustic signal into a frequency-sequence coefficient sequence from the encoding process described in Non-Patent Document 2. That is, the encoding process performed by the frequency domain transform unit 110 and the second encoding unit 201 is the same as the encoding process described in Non-Patent Document 2.
- R and S 1 , ..., S R are positive integers. Assume that S 1 ,..., S R satisfy the relationship of S 1 ⁇ S 2 ⁇ ... ⁇ S R.
- the average log energy difference variable length encoding unit 240 obtains the average energy of the coefficients included in the partial regions for each partial region obtained by the region dividing unit 220, quantizes the average energy of each partial region on the logarithmic axis, The difference between the average energy of the partial area to be quantized and the quantized value on the logarithmic axis is variable-length encoded to obtain the average energy code CA f (step S240).
- the variable length code is determined in advance so that the code amount is smaller than when the absolute value is large. Has been. That is, when the variation of the average log energy for each region is small, that is, when the undulation of the spectrum envelope is small, or when the concentration of the spectrum envelope is low, the length of the code of the average energy code CA f tends to be shortened. .
- the number of bits given as the code amount of the coefficient code CD f is distributed to each coefficient of each subregion coefficient sequence, taking into account the difference between the logarithmic values of the spectral level energy that cannot be discerned auditorily estimated in S2501).
- a step width for scalar quantization of each coefficient in the coefficient sequence is obtained (step S2502).
- the coefficient encoding unit 250 quantizes each coefficient of each partial region coefficient sequence with the determined step width and the number of bits, and further variable-length-encodes the integer value of each quantized coefficient to obtain the coefficient A code CD f is obtained (step S2503).
- the configuration of the decoding device 400 is shown in FIG.
- the decoding device 400 includes a switching unit 480 and a first decoding unit 401 and a second decoding unit 501.
- the first decoding unit 401 includes, for example, a linear prediction decoding unit 420, a spectrum envelope coefficient sequence generation unit 430, a normalization coefficient decoding unit 450, and an envelope denormalization unit 440.
- the second decoding unit 501 includes, for example, an average log energy difference variable length decoding unit 540 and a coefficient decoding unit 550.
- a code including a switching code and an input code is input to the decoding device 400 in units of frames that are predetermined time intervals.
- the input code includes the linear prediction coefficient code CL f and the normalization coefficient code CN f , and in the case of a frame encoded by the second encoding unit 201.
- the input code includes an average energy code CA f and a coefficient code CD f .
- the switching unit 480 determines whether the input code of the current frame is decoded by the first decoding unit 401 or the second decoding unit 501 from the input switching code, and the input code so that the determined decoding process can be performed. Is controlled to be input to the first decoding unit 401 or the second decoding unit 501 (step S480).
- the switching unit 480 determines that the input switching code is a code that specifies the encoding process of the first encoding unit 101, that is, a code that uses a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient. If the code identifies the encoding process, control is performed so that the input code is input to the second decoding unit 501 that performs the decoding process corresponding to the encoding process of the first encoding unit 101.
- the input switching code is a code that identifies the encoding process of the second encoding unit 201, that is, the variable length of the difference between the average energy of the coefficients for each divided frequency domain and the average energy of the adjacent frequency domain
- control is performed so that the input code is input to the second decoding unit 501 that performs a decoding process corresponding to the encoding process of the second encoding unit 201.
- the first decoding unit 401 includes a linear prediction decoding unit 420, a spectrum envelope coefficient sequence generation unit 430, a normalization coefficient decoding unit 450, and an envelope denormalization unit 440.
- the linear prediction decoding unit 420 decodes the linear prediction coefficient code CL f included in the input code to obtain a coefficient that can be converted into a decoded linear prediction coefficient.
- the coefficient that can be converted into the decoded linear prediction coefficient is the same as the coefficient that can be converted into the quantized linear prediction coefficient obtained by the linear prediction analysis encoding unit 120 of the encoding device 300.
- the decoding process performed by the linear predictive decoding unit 420 corresponds to the encoding process performed by the linear prediction analysis encoding unit 120 of the encoding device 300.
- the coefficient that can be converted into the linear prediction coefficient is the linear prediction coefficient itself, a PARCOR coefficient (partial autocorrelation coefficient), an LSP parameter, or the like.
- N is the number of samples in the frequency domain, and is a positive integer.
- the second decoding unit 501 includes an average log energy difference variable length decoding unit 540 and a coefficient decoding unit 550.
- the decoding process performed by the average log energy difference variable length decoding unit 540 corresponds to the encoding process performed by the average log energy difference variable length encoding unit 240 of the encoding device 300.
- the difference in energy in the logarithmic region of each partial region is the same as the energy difference in the logarithmic region of each partial region obtained by the average logarithmic energy difference variable length encoding unit 240 of the encoding device 300.
- the same symbol DiffE XB (r) is used.
- the decoded value in the logarithmic region of average energy is the same as the quantized value in the logarithmic region of average energy obtained by the average logarithmic energy difference variable length encoding unit 240 of the encoding device 300, and therefore the same symbol Q (log (E XB (r)) is used.
- the decoding process performed by the coefficient decoding unit 550 corresponds to the encoding process performed by the coefficient encoding unit 250 of the encoding apparatus 300.
- each of the coefficient code CD f Since in the coefficient coding unit 250 of the encoding device 300 in which the coefficients of the partial regions coefficient string obtained by variable length coding, each of the coefficient code CD f The code length of the code portion corresponding to the coefficient can be automatically restored.
- the quantization step width of each region is obtained from the decoded average energy Q (E XB (r)) obtained by the average log energy difference variable length decoding unit 540.
- Nt is the number of samples in the time domain, and is a positive integer.
- the encoding process and the decoding process can be switched only when the high frequency energy of the input acoustic signal is small, a plurality of encoding processes with different high frequency component quantization characteristics are provided. Even when the decoding process is implemented, it is possible to obtain a decoded acoustic signal that is less likely to feel unnatural to the listener.
- the input sound may be selected from an encoding process using a spectral envelope based on a coefficient that can be converted into a linear prediction coefficient and an encoding process using an average energy of coefficients for each divided frequency domain. Since the encoding process suitable for the signal can be selected without actually trying to encode, the encoding process suitable for the input acoustic signal can be performed with a small amount of calculation processing.
- the encoding is further performed between an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient and an encoding process using an average energy of coefficients for each divided frequency domain.
- the frequency domain coefficient sequence of the current frame is always encoded by the same encoding process as the encoding process of the previous frame.
- the second embodiment is different from the encoding process of the previous frame depending on the state in which the high frequency component of the input acoustic signal is sparse even when the magnitude of the energy of the high frequency component of the input acoustic signal is large.
- the encoding process allows the coefficient sequence in the frequency domain of the current frame to be encoded.
- the encoding device of the second embodiment makes it possible to determine an encoding process different from the previous frame as the encoding process of the current frame when the energy of the high frequency component of the input acoustic signal is small, and otherwise Depending on the state in which the high frequency component of the input acoustic signal is sparse, it is possible to determine an encoding process different from the previous frame as the encoding process of the current frame, or to perform the same encoding process as the previous frame. It is determined whether to determine the encoding process of the frame.
- the configuration of the encoding device of the second embodiment is the same as that of the first embodiment shown in FIG.
- the encoding apparatus 300 according to the second embodiment is the same as the encoding apparatus 300 according to the first embodiment, except that the processes of the switchability determination unit 381 and the switching determination unit 383 in the determination unit 380 are different.
- the configuration of the decoding device of the second embodiment is the same as that of the first embodiment in FIG. 2, and the processing of each part is the same as that of the decoding device of the first embodiment.
- the switchability determination unit 381 and the switch determination unit 383 in the determination unit 380 that perform processing different from that of the encoding device 300 of the first embodiment will be described.
- the switchability determination unit 381 determines that at least one of the magnitude of the high-frequency component energy of the input acoustic signal of the previous frame and the magnitude of the high-frequency component energy of the input acoustic signal of the current frame is smaller than a predetermined threshold. Is determined to be switchable, that is, the frequency sequence coefficient sequence of the current frame can be encoded by an encoding process different from the encoding process of encoding the frequency domain coefficient sequence of the previous frame, The determination result is output (step S381). In cases other than the above, it is not determined whether switching is possible or not, and information indicating that neither is determined is output as a determination result, or the determination result is not output. As the magnitude of the energy of the high frequency component of the input acoustic signal, the high frequency energy may be used, or the ratio of the high frequency energy to the total energy may be used as in the first embodiment. .
- the switching determination unit 383 is configured to input the input sound obtained from the input sound signal and the switchability information obtained by the switchability judgment unit 381, the information about which coding process obtained by the suitable coding process judgment unit 382 is suitable. Whether to encode the coefficient sequence in the frequency domain of the current frame by the first encoding unit 101 or the second encoding unit 201 is determined based on whether the high frequency component of the signal is sparse or not. Then, a switching code that is a code that can identify the determined encoding process is output (step S383B). The output switching code is input to the decoding device 400.
- the switch determination unit 383 determines the switching of the first embodiment. The same processing as that of the unit 383 is performed.
- the switchability determination unit 381 indicates that no switchability information is determined, or when the determination result is not input to the switchability determination unit 381, that is, the high frequency component of the input acoustic signal.
- the switching determination unit 383 when the energy of the high frequency component of the input acoustic signal is large, which is different from the switching determination unit 383 of the first embodiment among the operations of the switching determination unit 383 will be described.
- the encoding process of the second encoding unit 201 is the difference between the logarithmic value of the average energy of the coefficient for each divided frequency region exemplified in Non-Patent Document 2 and the logarithmic value of the average energy of adjacent frequency regions. This is an encoding process with variable length encoding.
- the switching determination unit 383 performs, for example, steps S3831B to S3836B in FIG.
- a region coefficient sequence for example, a power spectrum sequence, may be a target of processing by the switching determination unit 383.
- the logarithmic value of the average value of power for each partial coefficient sequence is AVE XS (q) obtained by the equation (3A).
- the logarithmic value of the average value of the power of the MDCT coefficient sequence is AVE Total obtained by Equation (9).
- the switching determination unit 383 determines that q is within a preset range from Q Low (where 1 ⁇ Q Low ) to Q High (where Q Low ⁇ Q High ⁇ Q), that is, a predetermined high frequency side.
- the number of AVE XS (q) satisfying the expression (10) in the range of one or a plurality of partial regions in the above, that is, the number of peak regions is obtained (step S3834B).
- ⁇ and ⁇ are positive constants.
- the switching determination unit 383 determines that the high frequency component of the input sound signal of the current frame is sparse, and the number of peak regions is the threshold TH3. If it exceeds, it is determined that the high frequency component of the input sound signal of the current frame is not sparse (step S3835B).
- the threshold TH3 is higher when the high-frequency component of the input sound signal of the past frame near the current frame is sparse than when the high-frequency component of the input sound signal of the past frame close to the current frame is not sparse. It is a value determined by a predetermined rule so as to be a large value.
- the predetermined TH3_1 is set as the threshold TH3, and the high-frequency component of the input sound signal of the past frame close to the current frame is sparse. Otherwise, a predetermined TH3_2 that is smaller than TH3_1 is set as the threshold TH3.
- the past frame close to the current frame is, for example, the previous frame or the previous two frames. The determination result of whether or not the high frequency component of the input sound signal of the current frame is sparse is stored in the switching determination unit 383 until at least two frames later.
- the switching determination unit 383 is based on the encoding process of the previous frame and the determination result of whether or not the high frequency component of the input acoustic signal for the current frame and the past frame close to the current frame is sparse. Thus, it is determined which of the first encoding unit 101 and the second encoding unit 201 encodes the coefficient sequence in the frequency domain of the current frame (step S3836B). That is, it is determined whether or not it is allowed to encode the coefficient sequence in the frequency domain of the current frame by an encoding process different from the encoding process of the previous frame.
- the switching determination unit 383 obtains the adaptive encoding process determination unit 382 when the coefficient sequence in the frequency domain of the current frame is allowed to be encoded by an encoding process different from the encoding process of the previous frame. Based on the information indicating which encoding process is suitable, the encoding process of the coefficient sequence in the frequency domain of the current frame is determined. For example, when the switching determination unit 383 permits the encoding of the coefficient sequence in the frequency domain of the current frame by encoding processing different from the encoding processing of the previous frame, the switching determination unit 383 generates the MDCT coefficient sequence X f ⁇ 1 of the previous frame.
- the switching determination unit 383 is a unit (not shown) in the encoding device 300 even when it is allowed to encode the coefficient sequence in the frequency domain of the current frame by an encoding process different from the encoding process of the previous frame. If it is determined that the coefficient sequence in the frequency domain of the current frame should be encoded by the same encoding process as the encoding process of the previous frame according to other information obtained by the above, the frequency domain corresponding to the input acoustic signal of the current frame May be encoded by the same encoding process as the encoding process of the previous frame.
- the number of samples may be different for each partial coefficient sequence.
- P 1 , P 2 ,..., P Q preferably satisfy P 1 ⁇ P 2 ⁇ ... ⁇ P Q.
- Q is a positive integer.
- the switching determination unit 383 does not perform step S3831B, step S3832B, or step S3833B, but adapts.
- the processing result performed by the encoding process determination unit 382 may be used.
- the encoding process suitable for the current frame is determined using one threshold, but the third embodiment performs the determination using two thresholds.
- the configuration of the encoding apparatus of the third embodiment is the same as that of the first embodiment shown in FIG.
- the encoding apparatus 300 according to the third embodiment is the same as the encoding apparatus 300 according to the first embodiment or the second embodiment except for the portions where the processes of the adaptive encoding process determination unit 382 and the switching determination unit 383 in the determination unit 380 are different. Is the same.
- the configuration of the decoding device of the third embodiment is the same as that of the first embodiment in FIG. 2, and the processing of each unit is the same as that of the decoding device of the first embodiment.
- the adaptive encoding process determination unit 382 and the switching determination unit 383 in the determination unit 380 that performs processing different from that of the encoding apparatus 300 according to the first embodiment will be described.
- the adaptive encoding process determination unit 382 performs the process of each step illustrated in FIG.
- the adaptive encoding process determination unit 382 has a frequency domain coefficient sequence corresponding to the input sound signal of the current frame suitable for either the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201. Or in other words, which encoding process may be performed, and the determination result is output (step S382A).
- the adaptive encoding process determination unit 382 performs the process of each step illustrated in FIG.
- the encoding process of the first encoding unit 101 is an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1
- the adaptive coding processing determination unit 382 determines that the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is the first when the undulation of the spectral envelope of the input acoustic signal is large or / and the concentration is high.
- the frequency domain corresponding to the input acoustic signal of the current frame When it is determined that it is suitable for the encoding process of one encoding unit 101 and the undulation of the spectral envelope of the input acoustic signal is small or / and the concentration is low, the frequency domain corresponding to the input acoustic signal of the current frame.
- the coefficient sequence is suitable for the encoding process of the second encoding unit 201 and the undulation of the spectral envelope of the input acoustic signal is medium or / and the concentration is medium
- the frequency domain coefficient sequence corresponding to the acoustic signal may be subjected to any one of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201. Conforming to any of the encoding process of the encoding process of the first encoding unit 101 of the encoding process and the second encoding unit 201 determines that outputs the determination result.
- the switching determination unit 383 When the undulation of the spectral envelope of the input acoustic signal is medium or / and the degree of concentration is medium, as will be described later, the switching determination unit 383 performs the same encoding process as that of the previous frame and performs the frequency domain coefficient of the current frame. Decide to encode the column. That is, the switching determination unit 383 determines the encoding process of the current frame so that the listener does not feel unnatural by switching the encoding process between the previous frame and the current frame. Therefore, the frequency domain coefficient sequence corresponding to the input sound signal of the current frame may be subjected to any of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201.
- the input sound signal of the current frame corresponds to the input sound signal of the current frame, not only when it is suitable for any encoding process of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201.
- the frequency domain coefficient sequence is suitable for either the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201, or the input acoustic signal of the current frame
- the input acoustic signal Moderate spectral envelope relief or And it may be included if the degree of concentration is medium.
- the above-described determination that “it is suitable for any one of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201” The determination that the suitability of the encoding process and the encoding process of the second encoding unit 201 for any one of the encoding processes cannot be determined may be read.
- any method may be adopted as a method for estimating the undulation and concentration of the spectrum, but a configuration for estimating the depth of the valley of the spectrum envelope will be described.
- this configuration when the spectral envelope valley is shallow, it is determined that the spectral undulation is small and the concentration is low, and when the spectral envelope valley is deep, it is determined that the spectral undulation is large and the concentration is high, and the spectral envelope valley is deep.
- the length is medium, it is determined that the undulation of the spectrum is medium and the degree of concentration is medium.
- the adaptive encoding process determination unit 382 performs the same steps S3821 to S3825 as the adaptive encoding process determination unit 382 of the first embodiment, and step S3826A different from the adaptive encoding process determination unit 382 of the first embodiment. Below, a different part from the adaptive encoding process determination part 382 of 1st embodiment is demonstrated.
- step S3825 the adaptive encoding process determination unit 382 performs the following determination process using the thresholds TH2_1 and TH2_2 and the output of conformance information described later (step S3826A).
- Adapted coding scheme judgment unit 382 the difference between the average value E V of AVE XS (q) of the partial region of the average value E and valleys AVE XS (q) of all the subregions in the case where the predetermined threshold TH2_1 smaller Is a spectrum having a shallow spectral valley and a low spectral envelope undulation or low concentration, the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is the second encoding unit 201. Is determined to be suitable for the encoding process.
- adaptation coding scheme judgment unit 382 the difference between the average value E V of AVE XS average value E and valleys subregion of AVE XS (q) of all partial areas (q) is greater than the threshold TH2_1 Is greater than the predetermined threshold TH2_2, the spectrum has a deep valley, and the spectrum envelope is estimated to have a large undulation or high concentration, so the frequency domain corresponding to the input acoustic signal of the current frame Is determined to be suitable for the encoding process of the first encoding unit 101.
- adaptation coding determination unit 382 all the partial regions of the AVE XS (q) the average value E and the average value difference is the threshold TH2_1 than threshold E V trough subregion of AVE XS (q) of TH2_2 below , It is estimated that the spectrum valley depth is medium, and the spectrum envelope has a moderate undulation or medium concentration, so it corresponds to the input sound signal of the current frame.
- the frequency domain coefficient sequence to be subjected to either the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201 may be performed. That is, it is determined that the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201 are suitable.
- the adaptive encoding process determination unit 382 outputs adaptive information that is information of the adaptive encoding process.
- the conforming information is a determination result of the conforming encoding process determining unit 382, and can be said to be information indicating which or both encoding processes are conforming.
- the adaptive encoding process determination unit 382 determines whether the frequency domain coefficient sequence corresponding to the input sound signal of the current frame is the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201.
- the frequency domain coefficient sequence that is output and corresponding to the input sound signal of the current frame may be subjected to either the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201. In other words, if it is determined that it is suitable for both the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201, the determination result is output. It is good also as a structure which does not.
- the switching determination unit 383 includes information indicating whether or not the switchability information obtained by the switchability determination unit 381 and which or both of the encoding processes obtained by the adaptive encoding process determination unit 382 are compatible, that is, a suitable encoding process. From the information (conformity information) of the current frame, it is determined whether the coefficient sequence in the frequency domain of the current frame is to be encoded by the first encoding unit 101 or the second encoding unit 201, and the determined encoding process A switching code, which is a code that can specify the ID, is output (step S383A). The output switching code is input to the decoding device 400.
- the switching determination unit 383 uses the same encoding process as that of the previous frame, regardless of which encoding process the current frame is suitable for. Are determined to be encoded.
- the switching determination unit 383 can switch, and the current frame is suitable for any encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201. Determines to encode the frequency domain coefficient sequence of the current frame by the same encoding process as that of the previous frame.
- the switching determination unit 383 is switchable, and when the current frame matches either the encoding process of the first encoding unit 101 or the encoding process of the second encoding unit 201, Regardless of the encoding process of the previous frame, it is determined that the coefficient sequence in the frequency domain of the current frame is encoded by the encoding process to which the current frame is suitable.
- the encoding process of the first encoding unit 101 is an encoding process using a spectrum envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1, and the second encoding unit Coding with variable-length coding of the difference between the logarithmic value of the average energy of the adjacent frequency domain and the logarithmic value of the average energy of the adjacent frequency domain in the logarithm of the average energy of the coefficients for each divided frequency domain, as exemplified in Non-Patent Document 2 It is processing.
- the switching determination unit 383 is information indicating whether or not the switchability information obtained by the switchability determination unit 381 indicates that switching is impossible and / or which encoding process obtained by the adaptive encoding process determination unit 382 is suitable.
- the switchability information obtained by the permission determination unit 381 indicates that switching is impossible
- the switchability information obtained by the permission determination unit 381 indicates that switching is not possible
- the information (conformance information) indicating which encoding process obtained by the encoding process determination unit 382 is suitable represents the encoding process of the second encoding unit 201
- the encoding process is determined.
- the switchability information obtained by the feasibility determination unit 381 indicates that switching is possible, and information (conformance information) indicating which encoding process obtained by the adaptive encoding process determination unit 382 is compatible is the second encoding unit 201.
- the switchability information obtained by the feasibility determination unit 381 indicates that switching is possible, and information (suitable information) indicating which encoding process obtained by the adaptive encoding process determination unit 382 is suitable is the first encoding unit 101.
- it is determined that the MDCT coefficient sequence X f (n) (n 1,..., N) of the current frame is encoded by the first encoding unit 101.
- the adaptive encoding processing determination unit 382 determines that the frequency domain coefficient sequence corresponding to the input sound signal of the current frame is the encoding processing of the first encoding unit 101 or the encoding processing of the second encoding unit 201.
- the switching determination unit 383 may select any of the above-described codes when information on the suitable coding process is not input.
- a process in a case where information indicating whether the encoding process is compatible (adaptation information) indicates that any of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201 is compatible. Just do it.
- the input acoustic signal of the current frame is encoded using a spectral envelope based on a coefficient that can be converted into a linear prediction coefficient exemplified in Non-Patent Document 1, and for each divided frequency region exemplified in Non-Patent Document 2.
- the input sound The determination including not only the magnitude and concentration of the signal envelope but also other information may be made.
- the frequency domain coefficient sequence corresponding to the input sound signal of the previous frame is encoded by the first encoding unit 101, the switchability determination unit 381 determines that switching is possible, and the adaptive encoding process determination unit 382 determines the current frame. Even if it is determined that the frequency domain coefficient sequence corresponding to the input acoustic signal of the second encoding unit 201 is compatible with the encoding process of the second encoding unit 201, the current information is obtained by other information obtained by means not shown in the encoding device 300.
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the frame should be encoded by the encoding process of the first encoding unit 101
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame May be encoded by the first encoding unit 101. That is, the encoding apparatus 300 encodes the frequency domain coefficient sequence corresponding to the input sound signal of the previous frame by the first encoding unit 101, and the switchability determination unit 381 determines that the switch is possible.
- the determination unit 382 determines that the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is suitable for the encoding process of the second encoding unit 201, the frequency domain coefficient corresponding to the input acoustic signal of the current frame Any configuration may be employed as long as it is possible to determine that the second encoding unit 201 encodes the column.
- the frequency domain coefficient sequence corresponding to the input sound signal of the previous frame is encoded by the second encoding unit 201, the switchability determination unit 381 determines that switching is possible, and the adaptive encoding process determination unit 382. Even when it is determined that the frequency domain coefficient sequence corresponding to the input sound signal of the current frame is suitable for the encoding process of the first encoding unit 101, other means (not shown) obtained by the encoding device 300 are obtained.
- the frequency domain corresponding to the input acoustic signal of the current frame May be encoded by the second encoding unit 201.
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the previous frame is encoded by the second encoding unit, the switchability determination unit 381 determines that switching is possible, and the adaptive encoding process determination
- the unit 382 determines that the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is suitable for the encoding process of the first encoding unit 101, the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame As long as the first encoding unit 101 can determine that encoding is possible.
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the previous frame is encoded by the first encoding unit 101, and the adaptive encoding process determination unit 382 is displayed.
- the switchability determination unit 381 determines that switching is possible
- the second encoding unit 201 converts the frequency domain coefficient sequence corresponding to the input sound signal of the current frame based on other information obtained by a unit (not illustrated) in the encoding device 300.
- the second encoding unit 201 may encode the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame.
- the frequency domain coefficient sequence corresponding to the input acoustic signal of the previous frame is encoded by the second encoding unit 201, and the adaptive encoding processing determination unit 382 is the frequency domain corresponding to the input acoustic signal of the current frame.
- the switchability determination unit 381 can switch It is determined that the coefficient sequence in the frequency domain corresponding to the input sound signal of the current frame should be encoded by the encoding process of the first encoding unit 101 based on other information obtained by means not shown in the encoding device 300 In such a case, the first encoding unit 101 may encode the frequency domain coefficient sequence corresponding to the input sound signal of the current frame.
- the switchability determination unit 381 determines that switching is possible, and the adaptive encoding process determination unit 382 has the first frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame.
- the coefficient sequence in the frequency domain corresponding to the input acoustic signal of the current frame is determined as the previous frame. Any configuration can be used as long as it can be determined to be encoded by the same encoding process.
- the determination unit 380 may not include the switchability determination unit 381.
- the switching determination unit 383 does not use the switchability information obtained by the switchability determination unit 381, but uses the frequency domain coefficient sequence of the current frame from the adaptation information obtained by the adaptive encoding process determination unit 382. Whether the first encoding unit 101 or the second encoding unit 201 performs encoding is determined, and a switching code that is a code that can identify the determined encoding process is output.
- the adaptive encoding process determination unit 382 determines that the frequency domain coefficient sequence corresponding to the input audio signal of the current frame is compatible with the encoding process of the first encoding unit 101, the input audio signal of the current frame Is encoded by the first encoding unit 101, and the adaptive encoding process determination unit 382 generates the frequency domain coefficient sequence corresponding to the input sound signal of the current frame as the code of the second encoding unit 201. If it is determined that it is compatible with the encoding process, the second encoding unit 201 may encode the frequency domain coefficient sequence corresponding to the input sound signal of the current frame.
- the determination including other information may be performed as in the first modification.
- the encoding apparatus 300 If it is determined that the coefficient sequence in the frequency domain corresponding to the input acoustic signal of the current frame is to be encoded by the encoding process of the second encoding unit 201 based on other information obtained by means not shown in FIG.
- the second encoding unit 201 may encode a frequency domain coefficient sequence corresponding to the input acoustic signal.
- the encoding apparatus If it is determined that the frequency domain coefficient sequence corresponding to the input sound signal of the current frame is to be encoded by the encoding process of the first encoding unit 101 based on other information obtained by means not shown in FIG.
- the first encoder 101 may encode the frequency domain coefficient sequence corresponding to the input acoustic signal of the frame.
- the adaptive encoding processing determination unit 382 determines that the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is compatible with the encoding processing of the first encoding unit 101
- the adaptive encoding processing determination unit 382 Any configuration that can determine that the corresponding frequency domain coefficient sequence is to be encoded by the encoding process of the first encoding unit 101 may be used.
- the adaptive encoding process determination unit 382 determines that the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame is compatible with the encoding process of the second encoding unit 201
- the adaptive encoding process determination unit 382 Any configuration may be employed as long as it can determine that the corresponding frequency domain coefficient sequence is to be encoded by the encoding process of the second encoding unit 201.
- the adaptive encoding process determination unit 382 uses the frequency domain coefficient sequence corresponding to the input sound signal of the current frame as the encoding process of the first encoding unit 101.
- the frequency region corresponding to the input sound signal of the current frame by other information obtained by means not shown in the encoding device 300 Is determined by the encoding process of the first encoding unit 101, the frequency sequence coefficient sequence corresponding to the input acoustic signal of the current frame is encoded by the first encoding unit 101. May be.
- the adaptive encoding process determination unit 382 uses either one of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201 as the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame.
- the frequency domain corresponding to the input acoustic signal of the current frame is encoded by the second encoding unit 201 based on other information obtained by means not shown in the encoding device 300.
- the second encoding unit 201 may encode the frequency domain coefficient sequence corresponding to the input sound signal of the current frame.
- the adaptive encoding process determination unit 382 uses the frequency domain coefficient sequence corresponding to the input acoustic signal of the current frame as the encoding process of the first encoding unit 101 and the second code.
- the frequency domain coefficient sequence corresponding to the input sound signal of the current frame is encoded by the same encoding process as the previous frame Any configuration is possible.
- the case is classified into one of two cases adjacent to each other with the threshold value as a boundary. What is necessary is just to set it. That is, a case where the threshold value is greater than or equal to a certain threshold value may be a case where the threshold value is greater than the threshold value, and a case where the value is smaller than the threshold value may be the case where the threshold value is equal to or less than the threshold value.
- a case where the value is greater than a certain threshold value may be a case where the value is equal to or greater than the threshold value, and a case where the value is equal to or less than the threshold value may be defined as a case where the value is smaller than the threshold value.
- the determination unit 380 determines whether at least one of the magnitude of the high-frequency component energy of the input acoustic signal of the previous frame and the magnitude of the high-frequency component energy of the input acoustic signal of the current frame is predetermined. If it is equal to or less than the threshold value, it may be possible to determine an encoding process different from the previous frame as the encoding process of the current frame (step S380).
- the switchability determination unit 381 is at least one of the magnitude of the energy of the high frequency component of the input acoustic signal of the previous frame and the magnitude of the energy of the high frequency component of the input acoustic signal of the current frame. Can be switched, that is, encoding the frequency domain coefficient sequence of the current frame with an encoding process different from the encoding process encoding the frequency domain coefficient sequence of the previous frame. The determination result may be determined, and the determination result may be output.
- adapted coding processing judgment unit 382 the difference between the average value E V of AVE XS (q) of the partial region of the average value E and valleys AVE XS (q) of all the partial regions If it is smaller than the predetermined threshold TH2, it is estimated that the spectrum valley is shallow and the spectrum envelope has little undulation or low concentration, so the frequency domain coefficient corresponding to the input acoustic signal of the current frame It may be determined that the sequence is compatible with the encoding process of the second encoding unit 201.
- the valley of the spectrum is deep, Since it is estimated that the spectrum envelope has a large undulation or high concentration, it is determined that the frequency domain coefficient sequence corresponding to the input sound signal of the current frame is suitable for the encoding process of the first encoding unit 101. May be.
- adapted coding processing judgment unit 382 the difference between the average value E V of AVE XS (q) of the partial region of the average value E and valleys AVE XS (q) of all the partial regions If the threshold is greater than the threshold TH2_2, which is greater than the threshold TH2_1, it is estimated that the spectrum has a deep valley and the spectrum envelope has a large undulation or high concentration.
- the frequency domain coefficient sequence corresponding to the acoustic signal may be determined to be suitable for the encoding process of the first encoding unit 101.
- adaptation coding processing judgment unit 382 is an difference between the average value E V of AVE XS average value E and valleys subregion of AVE XS (q) of all partial areas (q) is the threshold TH2_1 more If it is smaller than the threshold TH2_2, it is assumed that the spectrum has a medium valley depth and a spectrum envelope with a moderate undulation or medium concentration. It is assumed that the frequency domain coefficient sequence corresponding to 1 may perform any one of the encoding process of the first encoding unit 101 and the encoding process of the second encoding unit 201.
- the processes described in the encoding apparatus and the encoding method are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.
- each step in the encoding method is realized by a computer
- the processing content of the function that the encoding method should have is described by a program.
- each step is implement
- the program describing the processing contents can be recorded on a computer-readable recording medium.
- a computer-readable recording medium for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.
- each processing means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Spectroscopy & Molecular Physics (AREA)
Abstract
Description
以下、本発明の第一実施形態について説明する。第一実施形態は、周波数領域での符号化処理を行う複数の異なる符号化処理の何れかでフレーム毎の入力音響信号に対応する周波数領域の係数列を符号化する構成において、入力音響信号または/および入力音響信号に対応する周波数領域の係数列の高域成分のエネルギーが小さい場合にのみ、符号化処理の切り替えを行う構成である。入力音響信号の高域成分のエネルギーとは、入力音響信号の高域成分のエネルギーの大きさそのものや、入力音響信号に占める高域成分のエネルギーの大きさなどである。
符号化装置300の構成を図1に示す。符号化装置300は、周波数領域変換部110、決定部380、第一符号化部101、第二符号化部201を備えている。第一符号化部101は、例えば、線形予測分析符号化部120、スペクトル包絡係数列生成部130、包絡正規化部140、正規化係数符号化部150を備えている。第二符号化部201は、例えば、領域分割部220、平均対数エネルギー差分可変長符号化部240、係数符号化部250を備えている。符号化装置300には、所定の時間区間であるフレーム単位で、時間領域の音声音響ディジタル信号(以下、入力音響信号とする。)が入力され、フレームごとに以下の処理が行われる。以下では、現在の入力音響信号がf番目のフレームであるとして、各部の具体処理を説明する。f番目のフレームの入力音響信号をxf(n) (n=1, ..., Nt)とする。ここでNtはフレームあたりのサンプル数である。
周波数領域変換部110は、入力音響信号xf(n) (n=1, ..., Nt)を周波数領域の係数列、例えば、N点のMDCT係数列Xf(n) (n=1, ..., N)に変換して出力する(ステップS110)。ただし、Nは、周波数領域でのサンプル数であり、正の整数である。周波数領域への変換は、MDCTではない公知の変換方法により行ってもよい。
決定部380は、前フレームの入力音響信号の高域成分のエネルギーの大きさと、現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値より小さい場合に、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とする(ステップS380)。
切替可否判定部381は、前フレームの入力音響信号の高域成分のエネルギーの大きさと、現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値より小さい場合には、切り替え可、すなわち、前フレームの周波数領域の係数列を符号化した符号化処理と異なる符号化処理で現フレームの周波数領域の係数列を符号化することを可能とする、と判定し、そうでない場合には切り替え不可、すなわち、前フレームの周波数領域の係数列を符号化した符号化処理と異なる符号化処理で現フレームの周波数領域の係数列を符号化することを許さない、と判定し、判定結果を出力する(ステップS381)。
適合符号化処理判定部382は、現フレームの入力音響信号に対応する周波数領域の係数列が、第一符号化部101の符号化処理と第二符号化部201の符号化処理の何れに適しているかを判定し、判定結果を出力する(ステップS382)。
切替決定部383は、切替可否判定部381が得た切り替え可否の情報と、適合符号化処理判定部382が得た何れの符号化処理が適合するかの情報と、から、現フレームの周波数領域の係数列を第一符号化部101で符号化するか第二符号化部201で符号化するかを決定し、決定した符号化処理を特定可能な符号である切替符号を出力する(ステップS383)。出力した切替符号は復号装置400に入力される。ここで、切替決定部383は、切り替え不可である場合には、現フレームが適合する符号化処理が何れの符号化処理であったとしても、前フレームと同じ符号化処理で現フレームの周波数領域の係数列を符号化することを決定する。また、切り替え可である場合には、前フレームの符号化処理が何れの符号化処理であったとしても、現フレームが適合する符号化処理で現フレームの周波数領域の係数列を符号化することを決定する。ただし、切り替え可である場合であっても、現フレームが適合する符号化処理ではなく前フレームと同じ符号化処理で現フレームの周波数領域の係数列を符号化すると決定する場合が含まれていてもよい。
切替部384は、切替決定部383で決定した符号化処理で現フレームのMDCT係数列Xf(n) (n=1, ..., N)が符号化されるように、周波数領域変換部110が出力したMDCT係数列Xf(n) (n=1, ..., N)を第一符号化部101または第二符号化部201に入力するように制御を行う(ステップS384)。また、現フレームのMDCT係数列Xf(n) (n=1, ..., N)の符号化のために、現フレームの入力音響信号xf(n) (n=1, ..., Nt)も必要である場合には、現フレームの入力音響信号xf(n) (n=1, ..., Nt)も第一符号化部101または/および第二符号化部201に入力する。
第一符号化部101と第二符号化部201は、共に周波数領域の係数列を符号化する符号化処理を行うものであるが、行う符号化処理は互いに異なる。すなわち、第一符号化部101は、第二符号化部201とは異なる符号化処理により現フレームの周波数領域の係数列を符号化し、得られた符号である第一符号を出力する(ステップS101)。また、第二符号化部201は、第一符号化部101とは異なる符号化処理により現フレームの周波数領域の係数列を符号化し、得られた符号である第二符号を出力する(ステップS201)。例えば、第一符号化部101は線形予測係数に変換可能な係数に基づくスペクトル包絡を用いた符号化処理を行い、第二符号化部201は区分した周波数領域ごとの係数の平均エネルギーを用いた符号化処理を行う。
第一符号化部101は、線形予測分析符号化部120、スペクトル包絡係数列生成部130、包絡正規化部140、正規化係数符号化部150を備えている。第一符号化部101には、現フレームのMDCT係数列Xf(n) (n=1, ..., N)と入力音響信号x f(n) (n=1, ..., Nt)が入力され、線形予測係数符号CLfと正規化係数符号CNfを含む第一符号が出力される。出力された第一符号は復号装置400に入力される。なお、第一符号化部101は、非特許文献1に記載された符号化処理から入力音響信号を周波数領域の係数列に変換する部分を除いたものである。すなわち、周波数領域変換部110と第一符号化部101とで行われる符号化処理は、非特許文献1に記載された符号化処理と同様である。
線形予測分析部120は、入力音響信号xf(n) (n=1, ..., Nt)を線形予測分析して線形予測係数に変換可能な係数を求め、線形予測係数に変換可能な係数を符号化して、線形予測係数符号CLfと、線形予測係数符号CLfに対応する量子化された線形予測係数に変換可能な係数と、を得て出力する(ステップS120)。線形予測係数に変換可能な係数とは、線形予測係数そのもの、PARCOR係数(偏自己相関係数)またはLSPパラメータなどである。
スペクトル包絡係数列生成部130は、線形予測分析符号化部120が得た量子化された線形予測係数に変換可能な係数に対応するパワースペクトル包絡係数列Wf(n) (n=1, ..., N)を得て出力する(ステップS130)。
包絡正規化部140は、スペクトル包絡係数列計算部130が得たパワースペクトル包絡係数列Wf(n) (n=1, ..., N)を用いて、周波数領域変換部110が得たMDCT係数列の各係数Xf(n) (n=1, ..., N)を正規化し、正規化MDCT係数列XNf(n) (n=1, ..., N)を出力する(ステップS140)。すなわち、MDCT係数列Xf(n) (n=1, ..., N)の各係数をパワースペクトル包絡係数列Wf(n) (n=1, ..., N)に含まれる対応する係数で除した値による系列を正規化MDCT係数列XNf(n) (n=1, ..., N)として求める。
正規化係数符号化部150は、包絡正規化部140が得た正規化MDCT係数列XNf(n) (n=1, ..., N)を符号化して正規化係数符号CNfを得る(ステップS150)。
また、第二符号化部201は、領域分割部220、平均対数エネルギー差分可変長符号化部240、係数符号化部250を備えている。第二符号化部201には、現フレームのMDCT係数列Xf(n) (n=1, ..., N)が入力され、平均エネルギー符号と差分係数符号を含む第二符号が出力される。出力された第二符号は復号装置400に入力される。なお、第二符号化部201は、非特許文献2に記載された符号化処理から入力音響信号を周波数領域の係数列に変換する部分を除いたものである。すなわち、周波数領域変換部110と第二符号化部201とで行われる符号化処理は、非特許文献2に記載された符号化処理と同様である。
領域分割部220は、周波数領域変換部110が得たMDCT係数列Xf(n) (n=1, ..., N)を低域の部分領域ほどサンプル数が少なく高域の部分領域ほどサンプル数が多い複数の部分領域に分ける(ステップS220)。部分領域の個数をRとし、各部分領域に含まれるサンプル数をS1, ..., SRとすると、MDCT係数列の各係数Xf(n) (n=1, ..., N)は、最低域のサンプルから順に各部分領域に、XBf(1)(n) (n=1, ..., S1), XBf(2)(n) (n=1, ..., S2), ..., XBf(R)(n) (n=1, ..., SR)と分けられることになる。RおよびS1, ..., SRは正の整数である。S1, ..., SRは、S1≦S2≦... ≦SRの関係を満たすとする。XBf(1)(n) (n=1, ..., S1), XBf(2)(n) (n=1, ..., S2), ..., XBf(R)(n) (n=1, ..., SR)を部分領域係数列と呼ぶ。
平均対数エネルギー差分可変長符号化部240は、領域分割部220が得た各部分領域について、部分領域に含まれる係数の平均エネルギーを求め、部分領域の平均エネルギーそれぞれについて対数軸で量子化し、隣接する部分領域の平均エネルギーの対数軸での量子化値との差を可変長符号化し、平均エネルギー符号CAfを得る(ステップS240)。
係数符号化部250は、領域分割部220が得た部分領域係数列XBf(1)(n) (n=1, ..., S1), XBf(2)(n) (n=1, ..., S2), ..., XBf(R)(n) (n=1, ..., SR)の各係数を、平均対数エネルギー差分可変長符号化部240が得た平均エネルギーの対数領域での量子化値Q(log(EXB (r)) (r=1, ..., R)を用いて、例えばスカラ量子化して、係数符号CDfを得る(ステップS250)。このスカラ量子化に用いる量子化ステップ幅や量子化ビット数は、領域分割部220が得た部分領域係数列XBf(1)(n) (n=1, ..., S1), XBf(2)(n) (n=1, ..., S2), ..., XBf(R)(n) (n=1, ..., SR)ごとに平均エネルギーの量子化値Q(EXB(r)) (r=1, ..., R)から決定する。なお、平均エネルギーの量子化値Q(EXB(r)) (r=1, ..., R)は、平均エネルギーの対数領域での量子化値Q(log(EXB (r)) (r=1, ..., R)を式(7)により線形領域の値とすることにより求まる。
復号装置400の構成を図2に示す。復号装置400は、切替部480、第一復号部401第二復号部501を備えている。第一復号部401は、例えば、線形予測復号部420、スペクトル包絡係数列生成部430、正規化係数復号部450および包絡逆正規化部440を備えている。第二復号部501は、例えば、平均対数エネルギー差分可変長復号部540及び係数復号部550を備えている。復号装置400には、所定の時間区間であるフレーム単位で、切替符号と入力符号を含む符号が入力される。第一符号化部101で符号化されたフレームの場合には入力符号は線形予測係数符号CLfと正規化係数符号CNfを含み、第二符号化部201で符号化されたフレームの場合には入力符号は平均エネルギー符号CAfと係数符号CDfを含む。以下では、現在処理の対象となっているフレームがf番目のフレームであるとして、各部の具体処理を説明する。
切替部480は、入力された切替符号から、現フレームの入力符号を第一復号部401で復号するか第二復号部501で復号するかを決定し、決定した復号処理を行えるよう、入力符号を第一復号部401または第二復号部501に入力するよう制御を行う(ステップS480)。
第一復号部401は、線形予測復号部420、スペクトル包絡係数列生成部430、正規化係数復号部450、包絡逆正規化部440を備えている。第一復号部401には、現フレームの線形予測係数符号CLfと正規化係数符号CNfが入力され、周波数領域の係数列Xf(n) (n=1, ..., N)が出力される。
線形予測復号部420は、入力符号に含まれる線形予測係数符号CLfを復号して復号された線形予測係数に変換可能な係数を得る。復号された線形予測係数に変換可能な係数は、符号化装置300の線形予測分析符号化部120が得た量子化された線形予測係数に変換可能な係数と同じものである。また、線形予測復号部420が行う復号処理は、符号化装置300の線形予測分析符号化部120は行う符号化処理と対応するものである。なお、線形予測係数に変換可能な係数とは、線形予測係数そのもの、PARCOR係数(偏自己相関係数)またはLSPパラメータなどである。
スペクトル包絡係数生成部430は、線形予測復号部420が得た復号された線形予測係数に変換可能な係数に対応するパワースペクトル包絡係数列Wf(n) (n=1, ..., N)を得て出力する。ただし、Nは、周波数領域でのサンプル数であり、正の整数である。
正規化係数復号部450は、入力された正規化係数符号CNfを復号して復号正規化MDCT係数列^XNf(n) (n=1, ..., N)を得る(ステップS450)。ここで、正規化係数復号部450が行う復号処理は、符号化装置300の正規化係数符号化部150が行う符号化処理と対応するものである。すなわち、符号化装置300でMDCTではない周波数領域への変換処理が行われた場合には、^XNf(n) (n=1, ..., N)は、符号化装置300の周波数領域への変換処理に対応するMDCTではない領域の周波数領域の係数列である。なお、復号正規化MDCT係数列^XNf(n) (n=1, ..., N)は、符号化装置300の正規化係数符号化部150に入力された正規化MDCT係数列XNf(n) (n=1, ..., N)に対応するものであるが、それぞれの係数には量子化誤差が含まれるため、XNf(n) に"^"を付した^XNf(n)としてある。
包絡逆正規化部440は、スペクトル包絡係数列計算部430が得たパワースペクトル包絡係数列Wf(n) (n=1, ..., N)を用いて、正規化係数復号部450が得た復号正規化MDCT係数列の各係数^XNf(n) (n=1, ..., N)を逆正規化し、復号MDCT係数列^XNf(n) (n=1, ..., N)を出力する(ステップS440)。すなわち、復号正規化MDCT係数列XNf(n) (n=1, ..., N)の各係数とパワースペクトル包絡係数列Wf(n) (n=1, ..., N) の各係数とを対応する係数同士を乗算して得られる値による系列を復号MDCT係数列^Xf(n) (n=1, ..., N)として求める。
第二復号部501は、平均対数エネルギー差分可変長復号部540、係数復号部550を備えている。第二復号部501には、現フレームの平均エネルギー符号CAfと係数符号CDfが入力され、周波数領域の係数列Xf(n) (n=1, ..., N) が出力される。
平均対数エネルギー差分可変長復号部540は、入力された平均エネルギー符号CAfを復号して部分領域の復号平均エネルギーQ(EXB(r)) (r=1, ..., R)を得る(ステップS540)。なお、復号平均エネルギーは、符号化装置300の係数符号化部250で得られる平均エネルギーの量子化値と同じものであるので、同じ記号Q(EXB(r))を用いている。
係数復号部550は、平均対数エネルギー差分可変長復号部540で得られた復号平均エネルギーQ(EXB(r)) (r=1, ..., R)を用いて、係数符号CDfを復号して復号係数列^Xf (n) (n=1, ..., N)を得る(ステップS550)。ここで、係数復号部550が行う復号処理は、符号化装置300の係数符号化部250が行う符号化処理と対応するものである。入力された係数符号CDfは、符号化装置300の係数符号化部250で各部分領域係数列の各係数を可変長符号化して得られたものであるので、係数符号CDfのうちの各係数に対応する符号部分の符号長は自動的に復元できる。また、平均対数エネルギー差分可変長復号部540で得られた復号平均エネルギーQ(EXB(r))から各領域の量子化ステップ幅が求まる。これらにより、係数符号CDfから周波数領域の復号MDCT係数列^Xf(n) (n=1, ..., N)を得ることができる。
時間領域変換部410は、N点の復号MDCT係数列^Xf(n) (n=1, ..., N)を時間領域に変換して復号音響信号^xf(n) (n=1, ..., Nt)を得て出力する(ステップS410)。ただし、Ntは、時間領域でのサンプル数であり、正の整数である。符号化装置300の周波数領域変換部110でMDCTではない周波数領域への変換が行われた場合には、その変換処理に対応する時間領域への変換処理を行えばよい。
第一実施形態では、入力音響信号の高域成分のエネルギーの大きさが大きい場合には必ず前フレームの符号化処理と同じ符号化処理で現フレームの周波数領域の係数列を符号化していたが、第二実施形態は、入力音響信号の高域成分のエネルギーの大きさが大きい場合であっても、入力音響信号の高域成分が疎である状態次第では、前フレームの符号化処理と異なる符号化処理で現フレームの周波数領域の係数列を符号化することを許すものである。
切替可否判定部381は、前フレームの入力音響信号の高域成分のエネルギーの大きさと、現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値より小さい場合には、切り替え可、すなわち、前フレームの周波数領域の係数列を符号化した符号化処理と異なる符号化処理で現フレームの周波数領域の係数列を符号化することを可能とする、と判定し、判定結果を出力する(ステップS381)。上記以外の場合には、切り替え可であるとも切り替え不可とも判定せず、何れとも判定しなかったことを表す情報を判定結果として出力するか、判定結果を出力しない。入力音響信号の高域成分のエネルギーの大きさとしては、高域のエネルギーを用いてもよいし、全エネルギーに対する高域のエネルギーの割合を用いてもよいのは第一実施形態と同様である。
切替決定部383は、切替可否判定部381が得た切り替え可否の情報と、適合符号化処理判定部382が得た何れの符号化処理が適合するかの情報と、入力音響信号から求まる入力音響信号の高域成分が疎であるか否かの状態とから、現フレームの周波数領域の係数列を第一符号化部101で符号化するか第二符号化部201で符号化するかを決定し、決定した符号化処理を特定可能な符号である切替符号を出力する(ステップS383B)。出力した切替符号は復号装置400に入力される。
第一実施形態及び第二実施形態では1つの閾値を用いて現フレームが適している符号化処理を判定していたが、第三実施形態は2つの閾値を用いた判定を行うものである。
適合符号化処理判定部382は、図7に例示する各ステップの処理を行う。適合符号化処理判定部382は、現フレームの入力音響信号に対応する周波数領域の係数列が、第一符号化部101の符号化処理と第二符号化部201の符号化処理の何れに適しているか、言い換えれば、何れの符号化処理を行ってもよいものであるか、を判定し、判定結果を出力する(ステップS382A)。
切替決定部383は、切替可否判定部381が得た切り替え可否の情報と、適合符号化処理判定部382が得た何れのまたは双方の符号化処理が適合するかの情報すなわち適合する符号化処理の情報(適合情報)と、から、現フレームの周波数領域の係数列を第一符号化部101で符号化するか第二符号化部201で符号化するかを決定し、決定した符号化処理を特定可能な符号である切替符号を出力する(ステップS383A)。出力した切替符号は復号装置400に入力される。ここで、切替決定部383は、切り替え不可である場合には、現フレームが適合する符号化処理が何れの符号化処理であったとしても、前フレームと同じ符号化処理で現フレームの周波数領域の係数列を符号化することを決定する。また、切替決定部383は、切り替え可であり、かつ、現フレームが第一符号化部101の符号化処理と第二符号化部201の符号化処理の何れの符号化処理も適合する場合には、前フレームと同じ符号化処理で現フレームの周波数領域の係数列を符号化することを決定する。また、切替決定部383は、切り替え可であり、かつ、現フレームが第一符号化部101の符号化処理と第二符号化部201の符号化処理の何れか一方に適合する場合には、前フレームの符号化処理が何れの符号化処理であったとしても、現フレームが適合する符号化処理で現フレームの周波数領域の係数列を符号化することを決定する。
現フレームの入力音響信号が、非特許文献1に例示される線形予測係数に変換可能な係数に基づくスペクトル包絡を用いた符号化処理と、非特許文献2に例示される区分した周波数領域ごとの係数の平均エネルギーの対数値の隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴う符号化処理と、の何れの符号化処理が適合するかの判定には、入力音響信号のスペクトル包絡の起伏の大きさや集中度だけではなく、その他の情報を含めた判定を行ってもよい。
現フレームの周波数領域の係数列を第一符号化部101で符号化するか第二符号化部201で符号化するかの決定には、切替可否判定部381が得た切り替え可否の情報を用いないでもよい。この場合は、決定部380内に切替可否判定部381を備えなくてもよい。
例えば、適合符号化処理判定部382が現フレームの入力音響信号に対応する周波数領域の係数列が第一符号化部101の符号化処理に適合すると判定した場合であっても、符号化装置300に図示しない手段が得た他の情報によって現フレームの入力音響信号に対応する周波数領域の係数列を第二符号化部201の符号化処理で符号化すべきと判定された場合には、現フレームの入力音響信号に対応する周波数領域の係数列を第二符号化部201により符号化してもよい。
Claims (16)
- 入力音響信号を、所定時間区間のフレームごとに、周波数領域での複数の符号化処理のうちの決定された符号化処理で符号化する符号化方法であって、
前フレームの入力音響信号の高域成分のエネルギーの大きさと現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値以下の場合に、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とする決定ステップ
を含むことを特徴とする符号化方法。 - 入力音響信号を、所定時間区間のフレームごとに、周波数領域での複数の符号化処理のうちの決定された符号化処理で符号化する符号化方法であって、
前フレームの入力音響信号の高域成分のエネルギーの大きさと現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値以下の場合には、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とし、そうでない場合には、前記入力音響信号の高域成分が疎である状態に従って、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とするか、前フレームと同じ符号化処理を現フレームの符号化処理として決定するか、を決定する決定ステップ
を含むことを特徴とする符号化方法。 - 前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化ステップと、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化ステップと
を更に有し、
前記決定ステップは、更に、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第一符号化ステップで符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が所定の閾値より小さい場合には、現フレームを第二符号化ステップで符号化することを決定する
ことを特徴とする請求項1または2に記載の符号化方法。 - 前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化ステップと、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化ステップと
を更に有し、
前記決定ステップは、更に、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第二符号化ステップで符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が所定の閾値以上の場合には、現フレームを第一符号化ステップで符号化することを決定可能とする
ことを特徴とする請求項1または2に記載の符号化方法。 - 前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化ステップと、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化ステップと
を更に有し、
前記決定ステップは、更に、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第一符号化ステップで符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が所定の第一の閾値より小さい場合には、現フレームを第二符号化ステップで符号化することを決定可能とし、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第二符号化ステップで符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が、前記第一の閾値より大きな値である所定の第二の閾値以上の場合には、現フレームを第一符号化ステップで符号化することを決定可能とする
ことを特徴とする請求項1または2に記載の符号化方法。 - 入力音響信号を、所定時間区間のフレームごとに、周波数領域での複数の符号化処理のうちの決定された符号化処理で符号化する符号化方法であって、
前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化ステップと、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化ステップと、
現フレームの前記入力音響信号のスペクトルの起伏が大きい場合または集中度が高い場合には、現フレームを第一符号化ステップで符号化することを決定可能とし、現フレームの前記入力音響信号のスペクトルの起伏が小さい場合または集中度が低い場合には、現フレームを第二符号化ステップで符号化することを決定可能とする決定ステップと
を含むことを特徴とする符号化方法。 - 前記決定ステップは、更に、現フレームの前記入力音響信号のスペクトルの起伏が中程度の場合または集中度が中程度の場合には、前フレームと同じ符号化処理を現フレームの符号化処理として決定可能とする
ことを特徴とする請求項6に記載の符号化方法。 - 入力音響信号を、所定時間区間のフレームごとに、周波数領域での複数の符号化処理のうちの決定された符号化処理で符号化する符号化装置であって、
前フレームの入力音響信号の高域成分のエネルギーの大きさと現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値以下の場合に、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とする決定部
を含むことを特徴とする符号化装置。 - 入力音響信号を、所定時間区間のフレームごとに、周波数領域での複数の符号化処理のうちの決定された符号化処理で符号化する符号化装置であって、
前フレームの入力音響信号の高域成分のエネルギーの大きさと現フレームの入力音響信号の高域成分のエネルギーの大きさとの少なくとも何れかが所定の閾値以下の場合には、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とし、そうでない場合には、前記入力音響信号の高域成分が疎である状態に従って、前フレームと異なる符号化処理を現フレームの符号化処理として決定することを可能とするか、前フレームと同じ符号化処理を現フレームの符号化処理として決定するか、を決定する決定部
を含むことを特徴とする符号化装置。 - 前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化部と、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化部と
を更に有し、
前記決定部は、更に、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第一符号化部で符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が所定の閾値より小さい場合には、現フレームを第二符号化部で符号化することを決定する
ことを特徴とする請求項8または9に記載の符号化装置。 - 前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化部と、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化部と
を更に有し、
前記決定部は、更に、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第二符号化部で符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が所定の閾値以上の場合には、現フレームを第一符号化部で符号化することを決定可能とする
ことを特徴とする請求項8または9に記載の符号化装置。 - 前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化部と、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化部と
を更に有し、
前記決定部は、更に、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第一符号化部で符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が所定の第一の閾値より小さい場合には、現フレームを第二符号化部で符号化することを決定可能とし、前フレームと異なる符号化処理を現フレームの符号化処理とすることを可能とされた場合のうち、前フレームの前記入力音響信号を第二符号化部で符号化し、かつ、現フレームの前記入力音響信号のスペクトルの起伏が大きいまたは集中度が高いことを示す指標が前記第一の閾値より大きな値である所定の第二の閾値以上の場合には、現フレームを第一符号化部で符号化することを決定可能とする
ことを特徴とする請求項8または9に記載の符号化装置。 - 入力音響信号を、所定時間区間のフレームごとに、周波数領域での複数の符号化処理のうちの決定された符号化処理で符号化する符号化装置であって、
前記入力音響信号に対応する線形予測係数に変換可能な係数に基づくスペクトル包絡を用いて、前記入力音響信号に対応する周波数領域の係数列を符号化する第一符号化部と、
前記入力音響信号に対応する周波数領域の係数列について、区分した周波数領域ごとの係数の平均エネルギーの対数値を隣接する周波数領域の平均エネルギーの対数値との差分の可変長符号化を伴って符号化する第二符号化部と、
現フレームの前記入力音響信号のスペクトルの起伏が大きい場合または集中度が高い場合には、現フレームを第一符号化部で符号化することを決定可能とし、現フレームの前記入力音響信号のスペクトルの起伏が小さい場合または集中度が低い場合には、現フレームを第二符号化部で符号化することを決定可能とする決定部と
を含むことを特徴とする符号化装置。 - 前記決定部は、更に、現フレームの前記入力音響信号のスペクトルの起伏が中程度の場合または集中度が中程度の場合には、前フレームと同じ符号化処理を現フレームの符号化処理として決定可能とする
ことを特徴とする請求項13に記載の符号化装置。 - 請求項1から7の何れかの符号化方法の各ステップをコンピュータに実行させるためのプログラム。
- 請求項1から7の何れかの符号化方法の各ステップをコンピュータに実行させるためのプログラムが記録されたコンピュータ読み取り可能な記録媒体。
Priority Applications (15)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020177002231A KR101993828B1 (ko) | 2014-07-28 | 2015-05-15 | 부호화 방법, 장치, 프로그램 및 기록 매체 |
US15/327,490 US10304472B2 (en) | 2014-07-28 | 2015-05-15 | Method, device and recording medium for coding based on a selected coding processing |
JP2016538178A JP6411509B2 (ja) | 2014-07-28 | 2015-05-15 | 符号化方法、装置、プログラム及び記録媒体 |
CN201580041465.4A CN106796801B (zh) | 2014-07-28 | 2015-05-15 | 编码方法、装置、以及记录介质 |
EP15826810.2A EP3163571B1 (en) | 2014-07-28 | 2015-05-15 | Coding of a sound signal |
EP20200287.9A EP3796314B1 (en) | 2014-07-28 | 2015-05-15 | Coding of a sound signal |
EP19201443.9A EP3614382B1 (en) | 2014-07-28 | 2015-05-15 | Coding of a sound signal |
PL20200287T PL3796314T3 (pl) | 2014-07-28 | 2015-05-15 | Kodowanie sygnału dźwiękowego |
ES15826810T ES2770704T3 (es) | 2014-07-28 | 2015-05-15 | Codificación de una señal acústica |
PL15826810T PL3163571T3 (pl) | 2014-07-28 | 2015-05-15 | Kodowanie sygnału dźwiękowego |
KR1020197011029A KR102049294B1 (ko) | 2014-07-28 | 2015-05-15 | 부호화 방법, 장치, 프로그램 및 기록 매체 |
KR1020197018004A KR102061316B1 (ko) | 2014-07-28 | 2015-05-15 | 부호화 방법, 장치, 프로그램 및 기록 매체 |
US16/295,039 US10629217B2 (en) | 2014-07-28 | 2019-03-07 | Method, device, and recording medium for coding based on a selected coding processing |
US16/782,700 US11037579B2 (en) | 2014-07-28 | 2020-02-05 | Coding method, device and recording medium |
US16/782,725 US11043227B2 (en) | 2014-07-28 | 2020-02-05 | Coding method, device and recording medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014-152958 | 2014-07-28 | ||
JP2014152958 | 2014-07-28 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/327,490 A-371-Of-International US10304472B2 (en) | 2014-07-28 | 2015-05-15 | Method, device and recording medium for coding based on a selected coding processing |
US16/295,039 Continuation US10629217B2 (en) | 2014-07-28 | 2019-03-07 | Method, device, and recording medium for coding based on a selected coding processing |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016017238A1 true WO2016017238A1 (ja) | 2016-02-04 |
Family
ID=55217142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/063989 WO2016017238A1 (ja) | 2014-07-28 | 2015-05-15 | 符号化方法、装置、プログラム及び記録媒体 |
Country Status (8)
Country | Link |
---|---|
US (4) | US10304472B2 (ja) |
EP (3) | EP3796314B1 (ja) |
JP (3) | JP6411509B2 (ja) |
KR (3) | KR101993828B1 (ja) |
CN (4) | CN106796801B (ja) |
ES (3) | ES2770704T3 (ja) |
PL (2) | PL3163571T3 (ja) |
WO (1) | WO2016017238A1 (ja) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
PL3163571T3 (pl) * | 2014-07-28 | 2020-05-18 | Nippon Telegraph And Telephone Corporation | Kodowanie sygnału dźwiękowego |
CN114898761A (zh) | 2017-08-10 | 2022-08-12 | 华为技术有限公司 | 立体声信号编解码方法及装置 |
CN110868220B (zh) * | 2018-08-28 | 2021-09-07 | 株洲中车时代电气股份有限公司 | 车辆设备的身份标识的配置及异常检测方法 |
CN113948085B (zh) * | 2021-12-22 | 2022-03-25 | 中国科学院自动化研究所 | 语音识别方法、***、电子设备和存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742734A (en) * | 1994-08-10 | 1998-04-21 | Qualcomm Incorporated | Encoding rate selection in a variable rate vocoder |
JP2005534950A (ja) * | 2002-05-31 | 2005-11-17 | ヴォイスエイジ・コーポレーション | 線形予測に基づく音声コーデックにおける効率的なフレーム消失の隠蔽のための方法、及び装置 |
JP2012098735A (ja) * | 2006-07-31 | 2012-05-24 | Qualcomm Inc | 非アクティブフレームの広帯域符号化および復号化を行うためのシステム、方法、および装置 |
Family Cites Families (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
HU0004768D0 (ja) * | 1994-03-31 | 2001-02-28 | Arbitron Co | |
US5450490A (en) * | 1994-03-31 | 1995-09-12 | The Arbitron Company | Apparatus and methods for including codes in audio signals and decoding |
US5774846A (en) * | 1994-12-19 | 1998-06-30 | Matsushita Electric Industrial Co., Ltd. | Speech coding apparatus, linear prediction coefficient analyzing apparatus and noise reducing apparatus |
JP3317470B2 (ja) * | 1995-03-28 | 2002-08-26 | 日本電信電話株式会社 | 音響信号符号化方法、音響信号復号化方法 |
US7058572B1 (en) * | 2000-01-28 | 2006-06-06 | Nortel Networks Limited | Reducing acoustic noise in wireless and landline based telephony |
JP3612260B2 (ja) * | 2000-02-29 | 2005-01-19 | 株式会社東芝 | 音声符号化方法及び装置並びに及び音声復号方法及び装置 |
JP3453116B2 (ja) * | 2000-09-26 | 2003-10-06 | パナソニック モバイルコミュニケーションズ株式会社 | 音声符号化方法及び装置 |
JP3426207B2 (ja) * | 2000-10-26 | 2003-07-14 | 三菱電機株式会社 | 音声符号化方法および装置 |
US7200561B2 (en) * | 2001-08-23 | 2007-04-03 | Nippon Telegraph And Telephone Corporation | Digital signal coding and decoding methods and apparatuses and programs therefor |
JP3960932B2 (ja) * | 2002-03-08 | 2007-08-15 | 日本電信電話株式会社 | ディジタル信号符号化方法、復号化方法、符号化装置、復号化装置及びディジタル信号符号化プログラム、復号化プログラム |
US7599835B2 (en) * | 2002-03-08 | 2009-10-06 | Nippon Telegraph And Telephone Corporation | Digital signal encoding method, decoding method, encoding device, decoding device, digital signal encoding program, and decoding program |
US7379864B2 (en) * | 2003-05-06 | 2008-05-27 | Lucent Technologies Inc. | Method and apparatus for the detection of previous packet loss in non-packetized speech |
CN102280109B (zh) * | 2004-05-19 | 2016-04-27 | 松下电器(美国)知识产权公司 | 编码装置、解码装置及它们的方法 |
US7752039B2 (en) * | 2004-11-03 | 2010-07-06 | Nokia Corporation | Method and device for low bit rate speech coding |
US20060224381A1 (en) * | 2005-04-04 | 2006-10-05 | Nokia Corporation | Detecting speech frames belonging to a low energy sequence |
CN101496097A (zh) * | 2006-07-31 | 2009-07-29 | 高通股份有限公司 | 用于在与语音信号相关联的包中包含识别符的***及方法 |
CN101140759B (zh) * | 2006-09-08 | 2010-05-12 | 华为技术有限公司 | 语音或音频信号的带宽扩展方法及*** |
US9653088B2 (en) * | 2007-06-13 | 2017-05-16 | Qualcomm Incorporated | Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding |
EP2077551B1 (en) * | 2008-01-04 | 2011-03-02 | Dolby Sweden AB | Audio encoder and decoder |
US8560307B2 (en) * | 2008-01-28 | 2013-10-15 | Qualcomm Incorporated | Systems, methods, and apparatus for context suppression using receivers |
CN101727906B (zh) * | 2008-10-29 | 2012-02-01 | 华为技术有限公司 | 高频带信号的编解码方法及装置 |
CN101763856B (zh) * | 2008-12-23 | 2011-11-02 | 华为技术有限公司 | 信号分类处理方法、分类处理装置及编码*** |
CN101615395B (zh) * | 2008-12-31 | 2011-01-12 | 华为技术有限公司 | 信号编码、解码方法及装置、*** |
CN101770775B (zh) * | 2008-12-31 | 2011-06-22 | 华为技术有限公司 | 信号处理方法及装置 |
CN101552006B (zh) * | 2009-05-12 | 2011-12-28 | 武汉大学 | 加窗信号mdct域的能量及相位调整方法及其装置 |
KR20100136890A (ko) * | 2009-06-19 | 2010-12-29 | 삼성전자주식회사 | 컨텍스트 기반의 산술 부호화 장치 및 방법과 산술 복호화 장치 및 방법 |
KR20130088756A (ko) * | 2010-06-21 | 2013-08-08 | 파나소닉 주식회사 | 복호 장치, 부호화 장치 및 이러한 방법 |
CN102446508B (zh) * | 2010-10-11 | 2013-09-11 | 华为技术有限公司 | 语音音频统一编码窗型选择方法及装置 |
JP5694751B2 (ja) * | 2010-12-13 | 2015-04-01 | 日本電信電話株式会社 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、記録媒体 |
CN103329199B (zh) * | 2011-01-25 | 2015-04-08 | 日本电信电话株式会社 | 编码方法、编码装置、周期性特征量决定方法、周期性特征量决定装置、程序、记录介质 |
CN102800317B (zh) * | 2011-05-25 | 2014-09-17 | 华为技术有限公司 | 信号分类方法及设备、编解码方法及设备 |
TWI671736B (zh) * | 2011-10-21 | 2019-09-11 | 南韓商三星電子股份有限公司 | 對信號的包絡進行寫碼的設備及對其進行解碼的設備 |
CN103366750B (zh) * | 2012-03-28 | 2015-10-21 | 北京天籁传音数字技术有限公司 | 一种声音编解码装置及其方法 |
CN104217727B (zh) * | 2013-05-31 | 2017-07-21 | 华为技术有限公司 | 信号解码方法及设备 |
FR3013496A1 (fr) * | 2013-11-15 | 2015-05-22 | Orange | Transition d'un codage/decodage par transformee vers un codage/decodage predictif |
KR101841380B1 (ko) * | 2014-01-13 | 2018-03-22 | 노키아 테크놀로지스 오와이 | 다중-채널 오디오 신호 분류기 |
DK3379535T3 (da) * | 2014-05-08 | 2019-12-16 | Ericsson Telefon Ab L M | Audiosignalklassifikator |
GB2526128A (en) * | 2014-05-15 | 2015-11-18 | Nokia Technologies Oy | Audio codec mode selector |
WO2015174912A1 (en) * | 2014-05-15 | 2015-11-19 | Telefonaktiebolaget L M Ericsson (Publ) | Audio signal classification and coding |
PL3163571T3 (pl) * | 2014-07-28 | 2020-05-18 | Nippon Telegraph And Telephone Corporation | Kodowanie sygnału dźwiękowego |
TWI602172B (zh) * | 2014-08-27 | 2017-10-11 | 弗勞恩霍夫爾協會 | 使用參數以加強隱蔽之用於編碼及解碼音訊內容的編碼器、解碼器及方法 |
US10049684B2 (en) * | 2015-04-05 | 2018-08-14 | Qualcomm Incorporated | Audio bandwidth selection |
-
2015
- 2015-05-15 PL PL15826810T patent/PL3163571T3/pl unknown
- 2015-05-15 CN CN201580041465.4A patent/CN106796801B/zh active Active
- 2015-05-15 ES ES15826810T patent/ES2770704T3/es active Active
- 2015-05-15 EP EP20200287.9A patent/EP3796314B1/en active Active
- 2015-05-15 PL PL20200287T patent/PL3796314T3/pl unknown
- 2015-05-15 ES ES20200287T patent/ES2908564T3/es active Active
- 2015-05-15 KR KR1020177002231A patent/KR101993828B1/ko active IP Right Grant
- 2015-05-15 EP EP19201443.9A patent/EP3614382B1/en active Active
- 2015-05-15 WO PCT/JP2015/063989 patent/WO2016017238A1/ja active Application Filing
- 2015-05-15 CN CN202110195328.3A patent/CN112992164A/zh active Pending
- 2015-05-15 EP EP15826810.2A patent/EP3163571B1/en active Active
- 2015-05-15 KR KR1020197011029A patent/KR102049294B1/ko active IP Right Grant
- 2015-05-15 JP JP2016538178A patent/JP6411509B2/ja active Active
- 2015-05-15 CN CN202110191341.1A patent/CN112992163A/zh active Pending
- 2015-05-15 US US15/327,490 patent/US10304472B2/en active Active
- 2015-05-15 KR KR1020197018004A patent/KR102061316B1/ko active IP Right Grant
- 2015-05-15 CN CN202110195414.4A patent/CN112992165A/zh active Pending
- 2015-05-15 ES ES19201443T patent/ES2838006T3/es active Active
-
2018
- 2018-04-25 JP JP2018083901A patent/JP6608993B2/ja active Active
-
2019
- 2019-03-07 US US16/295,039 patent/US10629217B2/en active Active
- 2019-07-31 JP JP2019140886A patent/JP6739604B2/ja active Active
-
2020
- 2020-02-05 US US16/782,725 patent/US11043227B2/en active Active
- 2020-02-05 US US16/782,700 patent/US11037579B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5742734A (en) * | 1994-08-10 | 1998-04-21 | Qualcomm Incorporated | Encoding rate selection in a variable rate vocoder |
JP2005534950A (ja) * | 2002-05-31 | 2005-11-17 | ヴォイスエイジ・コーポレーション | 線形予測に基づく音声コーデックにおける効率的なフレーム消失の隠蔽のための方法、及び装置 |
JP2012098735A (ja) * | 2006-07-31 | 2012-05-24 | Qualcomm Inc | 非アクティブフレームの広帯域符号化および復号化を行うためのシステム、方法、および装置 |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6739604B2 (ja) | 符号化方法、装置、プログラム及び記録媒体 | |
US10734009B2 (en) | Periodic-combined-envelope-sequence generation device, periodic-combined-envelope-sequence generation method, periodic-combined-envelope-sequence generation program and recording medium | |
JP6509973B2 (ja) | 符号化方法、符号化装置、プログラム、および記録媒体 | |
EP2546994B1 (en) | Coding method, decoding method, apparatus, program and recording medium | |
US10199046B2 (en) | Encoder, decoder, coding method, decoding method, coding program, decoding program and recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15826810 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016538178 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15327490 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20177002231 Country of ref document: KR Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2015826810 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2015826810 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |