WO2000030075A1 - Closed-loop variable-rate multimode predictive speech coder - Google Patents

Closed-loop variable-rate multimode predictive speech coder Download PDF

Info

Publication number
WO2000030075A1
WO2000030075A1 PCT/US1999/026850 US9926850W WO0030075A1 WO 2000030075 A1 WO2000030075 A1 WO 2000030075A1 US 9926850 W US9926850 W US 9926850W WO 0030075 A1 WO0030075 A1 WO 0030075A1
Authority
WO
WIPO (PCT)
Prior art keywords
coding
coding mode
mode
speech
threshold value
Prior art date
Application number
PCT/US1999/026850
Other languages
English (en)
French (fr)
Inventor
Amitava Das
Sharath Manjunath
Andrew P. Dejaco
Original Assignee
Qualcomm Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Incorporated filed Critical Qualcomm Incorporated
Priority to KR1020017006035A priority Critical patent/KR20010087393A/ko
Priority to EP99957560A priority patent/EP1129451A1/en
Priority to JP2000583004A priority patent/JP2002530706A/ja
Priority to AU15243/00A priority patent/AU1524300A/en
Publication of WO2000030075A1 publication Critical patent/WO2000030075A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention pertains generally to the field of speech processing, and more specifically to closed-loop, variable-rate, multimode, predictive coding of speech.
  • Speech coders typically comprise an encoder and a decoder, or a codec.
  • the encoder analyzes the incoming speech frame to extract certain relevant parameters, and then quantizes the parameters into binary representation, i.e., to a set of bits or a binary data packet.
  • the data packets are transmitted over the communication channel to a receiver and a decoder.
  • the decoder processes the data packets, unquantizes them to produce the parameters, and then resynthesizes the speech frames using the unquantized parameters.
  • the function of the speech coder is to compress the digitized speech signal into a low-bit-rate signal by removing all of the natural redundancies inherent in speech.
  • the challenge is to retain high voice quality of the decoded speech while achieving the target compression factor.
  • the performance of a speech coder depends on (1) how well the speech model, or the combination of the analysis and synthesis process described above, performs, and (2) how well the parameter quantization process is performed at the target bit rate of N G bits per frame. The goal of the speech model is thus to capture the essence of the speech signal, or the target voice quality, with a small set of parameters for each frame.
  • a multimode coder applies different modes, or encoding- decoding algorithms, to different types of input speech frames.
  • Each mode, or encoding-decoding process is customized to represent a certain type of speech segment (i.e., voiced, unvoiced, or background noise) in the most efficient manner.
  • An external mode decision mechanism examines the input speech frame and make a decision regarding which mode to apply to the frame.
  • the mode decision is done in an open-loop fashion by extracting a number of parameters out of the input frame and evaluating them to make a decision as to which mode to apply.
  • the mode decision is made without knowing in advance the exact condition of the output speech, i.e., how similar the output speech will be to the input speech in terms of voice-quality or any other performance measure.
  • An exemplary open-loop mode decision for a speech codec is described in U.S. Patent No. 5,414,796, which is assigned to the assignee of the present invention and fully incorporated herein by reference.
  • Multimode coding can be fixed-rate, using the same number of bits N 0 for each frame, or variable-rate, in which different bit rates are used for different modes.
  • the goal in variable-rate coding is to use only the amount of bits needed to encode the codec parameters to a level adequate to obtain the target quality.
  • VBR variable-bit-rate
  • Conventional VBR speech coders are designed with modes having different bit-rates.
  • An exemplary variable rate speech coder is described in U.S. Patent No. 5,414,796, assigned to the assignee of the present invention and previously fully incorporated herein by reference.
  • the codec described in the aforesaid patent has the following four rates: (1) full rate (FR); (2) half rate (HR); (3) quarter rate (QR); and (4) eighth rate (ER).
  • FR full rate
  • HR half rate
  • QR quarter rate
  • ER eighth rate
  • each frame of speech is encoded by 160, eighty, forty, and twenty bits per frame, respectively.
  • An external open-loop mode decision is made regarding which mode (FR, HR, QR or ER) to apply to the input speech frame.
  • the application areas include wireless telephony, satellite communications, Internet telephony, various multimedia and voice-streaming applications, voice mail, and other voice storage systems.
  • the driving forces are the need for high capacity and the demand for robust performance under packet loss situations.
  • Various recent speech coding standardization efforts are another direct driving force propelling research and development of low-rate speech coding algorithms.
  • a low-rate speech coder creates more channels, or users, per allowable application bandwidth, and a low- rate speech coder coupled with an additional layer of suitable channel coding can fit the overall bit-budget of coder specifications and deliver a robust performance under channel error conditions.
  • Conventional speech coders typically use some form of prediction mechanism to encode the current frame.
  • a speech coder exploits and uses the information contained in the last decoded and recreated frame. This works well because there is typically strong correlation, or similarity, between successive frames.
  • P(n) is a conventional prediction filter that produces an approximation of current frame from past quantized frame, tJTie quantized version of the prediction error E cur (n) of the current frame.
  • SNR signal-to- noise ratio
  • PSNR perceptual SNR
  • the prediction filter information is necessarily sent to the decoder as a certain number of bits, Np.
  • the remaining available bits, No - Np can be used to encode the prediction error signal E cur . If the prediction from the quantized past frame, S prev qunntlzed , generates an excellent predicted representation S cur _ predlcted of the current frame S cur , the prediction error E cur will be small, having a low dynamic range. Hence, it will be relatively easy to encode the prediction error E cur with a small number of bits.
  • the total number of bits per frame, No is high.
  • the QCELP ⁇ supports 260 bits per 20-ms frame. Therefore, even after allocating a number of bits, Np, to quantize the prediction filter parameter, there are enough remaining bits, No-Np, to accurately encode the prediction error.
  • Np a number of bits
  • No-Np a number of bits
  • a speech coder advantageously includes a codec configured to operate in at least one of a plurality of coding modes; and a closed- loop mode decision module coupled to the codec and configured to apply a first coding mode from the plurality of coding modes to an input speech frame, the first coding mode having a first bit rate that is lower than the bit rate of any other coding mode of the plurality of coding modes, the closed-loop mode decision module being further configured to obtain a performance measure of the codec, compare the performance measure with a threshold value, and, if the performance measure does not exceed the threshold value, reject the first coding mode in favor of a second coding mode having a second bit rate that is greater than the first bit rate.
  • a method of coding speech frames advantageously includes the steps of selecting a first coding mode to apply to a speech frame, the first coding mode having a first bit rate; obtaining a coding performance measure; comparing the coding performance measure with a threshold value; and rejecting the first coding mode in favor of a second coding mode if the coding performance measure does not exceed the threshold value, the second coding mode having a second bit rate that exceeds the first bit rate.
  • a speech coder advantageously includes means for selecting a first coding mode to apply to a speech frame, the first coding mode having a first bit rate; means for obtaining a coding performance measure; means for comparing the coding performance measure with a threshold value; and means for rejecting the first coding mode in favor of a second coding mode if the coding performance measure does not exceed the threshold value, the second coding mode having a second bit rate that exceeds the first bit rate.
  • FIG. 1 is a block diagram of a communication channel terminated at each end by speech coders.
  • FIG. 2 is a block diagram of an encoder.
  • FIG. 3 is a block diagram of a decoder.
  • FIG. 4 is a flow chart illustrating the steps of a closed-loop, multimode, predictive coding technique for speech frames at low bit rates.
  • a first encoder 10 receives digitized speech samples s(n) and encodes the samples s(n) for transmission on a transmission medium 12, or communication channel 12, to a first decoder 14.
  • the decoder 14 decodes the encoded speech samples and synthesizes an output speech signal s SYNTH (n).
  • a second encoder 16 encodes digitized speech samples s(n), which are transmitted on a communication channel 18.
  • a second decoder 20 receives and decodes the encoded speech samples, generating a synthesized output speech signal s SYNTH (n).
  • the speech samples s(n) represent speech signals that have been digitized and quantized in accordance with any of various methods known in the art including, e.g., pulse code modulation (PCM), companded ⁇ -law, or A-law.
  • PCM pulse code modulation
  • the speech samples s(n) are organized into frames of input data wherein each frame comprises a predetermined number of digitized speech samples s(n). In an exemplary embodiment, a sampling rate of 8 kHz is employed, with each 20 ms frame comprising 160 samples.
  • the rate of data transmission may advantageously be varied on a frame-to- frame basis from 8 kbps (full rate) to 4 kbps (half rate) to 2 kbps (quarter rate) to 1 kbps (eighth rate). Varying the data transmission rate is advantageous because lower bit rates may be selectively employed for frames containing relatively less speech information. As understood by those skilled in the art, other sampling rates, frame sizes, and data transmission rates may be used.
  • the first encoder 10 and the second decoder 20 together comprise a first speech coder, or speech codec.
  • the second encoder 16 and the first decoder 14 together comprise a second speech coder.
  • speech coders may be implemented with a digital signal processor (DSP), an application-specific integrated circuit (ASIC), discrete gate logic, firmware, or any conventional programmable software module and a microprocessor.
  • the software module could reside in RAM memory, flash memory, registers, or any other form of writable storage medium known in the art.
  • any conventional processor, controller, or state machine could be substituted for the microprocessor.
  • Exemplary ASICs designed specifically for speech coding are described in U.S. Patent No. 5,727,123, assigned to the assignee of the present invention and fully incorporated herein by reference, and U.S. Application Serial No. 08/197,417, entitled VOCODER ASIC, filed February 16, 1994, assigned to the assignee of the present invention, and fully incorporated herein by reference.
  • an encoder 100 that may be used in a speech coder includes a mode decision module 102, a pitch estimation module 104, an LP analysis module 106, an LP analysis filter 108, an LP quantization module 110, and a residue quantization module 112.
  • Input speech frames s(n) are provided to the mode decision module 102, the pitch estimation module 104, the LP analysis module 106, and the LP analysis filter 108.
  • the mode decision module 102 produces a mode index I M and a mode M based upon the periodicity of each input speech frame s(n).
  • Various methods of classifying speech frames according to periodicity are described in U.S. Application Serial No.
  • the pitch estimation module 104 produces a pitch index I P and a lag value
  • the LP analysis module 106 performs linear predictive analysis on each input speech frame s(n) to generate an LP parameter a.
  • the LP parameter a is provided to the LP quantization module 110.
  • the LP quantization module 110 also receives the mode M.
  • the LP quantization module 110 produces an LP index I LP and a quantized LP parameter u .
  • the LP analysis filter 108 receives the quantized LP parameter a in addition to the input speech frame s(n).
  • the LP analysis filter 108 generates an LP residue signal R[n], which represents the error between the input speech frames s(n) and the quantized linear predicted parameters .
  • the LP residue R[n], the mode M, and the quantized LP parameter a are provided to the residue quantization module 112. Based upon these values, the residue quantization module 112 produces a residue index I ana 1 a quantized residue signal — .
  • a decoder 200 that may be used in a speech coder includes an LP parameter decoding module 202, a residue decoding module 204, a mode decoding module 206, and an LP svnthesis filter 208.
  • the mode decoding module 206 receives and decodes a mode index I M , generating therefrom a mode M.
  • the LP parameter decoding module 202 receives the mode M and an LP index I LP .
  • the LP parameter decoding module 202 decodes the received values to produce a quantized LP parameter a .
  • the residue decoding module 204 receives a residue index I ⁇ , a pitch index I P , and the mode index I M .
  • the residue decoding module 204 decodes the received values to generate a quantized residue signal ⁇ [ ' m + i .
  • the quantized residue signal R[n] and the quantized LP parameter a are provided to the LP synthesis filter 208, which synthesizes a decoded output speech signal s[n] therefrom.
  • a multimode coder first uses an open-loop decision mode, relying on parameters extracted out of the current frame to classify the current frame as background-noise /silence (N), unvoiced speech (UV), or voiced speech (V).
  • N-type frames are coded with an eighth-rate mode
  • UV-type frames are coded with a quarter- rate mode.
  • V-type frames i.e., voiced speech frames
  • the full-rate mode may advantageously be a prediction-based coding scheme with adequate bits to accurately encode various types of voiced speech, delivering a perceptual signal- to-noise ratio (PSNR) well above the target PSNR (a predefined or variable threshold value).
  • PSNR perceptual signal- to-noise ratio
  • the half-rate mode is advantageously a prediction-based coding scheme designed to encode frames a high degree of correlation with the previous frame (i.e., frames that are quite similar to the previous frame).
  • the number of bits available in the half-rate mode is adequate to encode the prediction parameters for frames with high correlation, as well as the prediction error, which is relatively small due to the high correlation between successive frames.
  • Such frames are typically encountered in steady voiced speech segments, which are therefore amenable to half-rate coding.
  • the performance of prediction-based coding schemes also depends on how accurately the previous frame is quantized.
  • a closed-loop mode selection process is employed after the open-loop mode to ensure that the coding performance exceeds the predefined (or variable) target PSNR value.
  • the open-loop mode need not necessarily be applied at all.
  • the flow chart of FIG. 4 illustrates a closed-loop, multimode, predictive coding technique for speech frames at low bit rates, in accordance with one embodiment.
  • a frame number counter is set equal to 1.
  • the algorithm then proceeds to step 302, starting the coding process.
  • the algorithm then proceeds to step 304.
  • the algorithm checks the current frame and the previous quantized frame.
  • the algorithm then proceeds to step 306.
  • the algorithm determines whether the current frame should be classified as silence or background noise. This determination is made in accordance with various conventional techniques for measuring frame energy, such as, e.g., calculating the sum-of-squares. If the frame is classified as silence or background noise, the algorithm proceeds to step 308.
  • the algorithm applies an eighth-rate coding mode to the frame.
  • step 310 the algorithm determines whether the current frame should be classified as unvoiced speech. This determination is made in accordance with various known methods of periodicity determination, such as, e.g., the use of zero crossings and normalized autocorrelation functions (NACFs). These techniques are described in the aforementioned U.S. Application Serial No. 08/815,354, previously fully incorporated herein by reference. If the frame is classified as unvoiced speech, the algorithm proceeds to step 314. In step 314 a quarter-rate coding mode is applied to the frame. The algorithm then proceeds to step 310.
  • NACFs normalized autocorrelation functions
  • step 312 the algorithm proceeds to step 316, considering the frame to contain voiced speech.
  • step 316 the algorithm goes to a half-rate prediction-based coding mode.
  • step 318 the PSNR is computed.
  • the algorithm then proceeds to step 320.
  • step 320 the algorithm determines whether the computed PSNR is greater than a predefined threshold, or target, PSNR value.
  • the threshold, or target, PSNR value may be a function of average bit rate. For example, the average bit rate is calculated periodically and fed back to the algorithm, which adjusts the target threshold value accordingly. Further, it should be understood that any conventional measure of performance may be substituted for PSNR.
  • the algorithm proceeds to step 322. In step 322 a half-rate coding mode is applied to the frame. The algorithm then proceeds to step 310. If, on the other hand, in step 320 the computed PSNR does not exceed the target PSNR, the algorithm proceeds to step 324. In step 324 the algorithm applies a full-rate coding mode to the frame. The algorithm then proceeds to step 310.
  • step 310 the frame number counter is incremented by 1.
  • the algorithm then proceeds to step 326.
  • step 326 the algorithm determines whether the frame number counter value is greater than or equal to the total number of frames that must be processed (i.e., whether there are any remaining frames to process). If the frame number counter value is less than the total number of frames to be processed, the algorithm returns to step 302, beginning the coding process for the next frame. If, on the other hand, the frame number counter value is greater than or equal to the total number of frames to be processed, the algorithm proceeds to step 328, ending the coding process.
  • the full-rate coding mode described above with respect to FIG. 4 could be a higher-bit-rate predictive mechanism (i.e., any bit rate that is greater than half-rate).
  • a higher-bit-rate, direct coding mechanism is substituted for the full-rate, predictive coding mode.
  • the direct coding mode encodes the current speech frame or residue without using any information from the previous frame.
  • a direct encoding method is appropriate for speech segments for which there is no similarity between the current frame and the previous frame.
  • An example is during the onset of a voice segment.
  • Another example is unvoiced-to- voiced segment transitions.
  • a direct encoding method is also useful in the middle of voiced segments when the cumulative effect of prediction-based encoding has degraded the past quantized frame so as to be too far out of sync with the corresponding original speech frame. In this case predictive coding will fail, even at much higher bit rates, due to the lack of similarity between the past quantized frame and the past original frame.
  • a fresh capture of the current frame with a direct encoding method will not only enhance the preservation of the current frame, but will also facilitate future prediction-based encoding of the next and later frames because the prediction mechanism will be aided by a more accurate memory.
  • the Rl coding method is a higher-rate, direct coding method.
  • the R2 coding method is a lower-rate, predictive coding method.
  • a closed-loop decision is performed such that the R2 coding method is tried first, the performance is checked by comparing with a performance measure, and the algorithm switches to the Rl coding method if the performance for the R2 coding mode is insufficient.
  • the higher-rate, Rl coding mode is tried first, the performance is checked by comparing with a performance measure, and, if the performance is satisfactory, the lower-rate, R2 coding mode is tried.
  • the performance check is then performed for the R2 coding mode, and if the R2 coding mode performance is unsatisfactory, the Rl coding mode is applied to the frame.
  • multiple coding modes having bit rates R1,R2,...,RN-1,RN (where R1>R2>...>RN-1>RN) are employed.
  • a closed-loop decision is performed such that the lowest rate, RN, is tried first. If the RN coding mode performs adequately, the RN coding mode is retained for the frame. Otherwise, the next, higher-rate coding mode, RN-1, is applied. The process is reiterated until either a coding mode performs adequately or the highest-rate mode, Rl, is retained. In an alternate embodiment, the highest rate, Rl, is tried first. If the Rl mode performs adequately, the next, lower-rate coding mode, R2, is tried. The process is continued until a given coding mode does not perform adequately (at which time the last coding mode to perform adequately is applied), or until the lowest-rate coding mode, RN, performs satisfactorily and is applied.
  • multiple coding modes having bit rates Rl,R2,...,Rm-l,Rm,Rm+l,...,RN are employed.
  • the bit rates have the following relative magnitudes: Rl>R2>Rm-l>Rm>Rm+l>RN.
  • a closed-loop mode decision works in conjunction with an open-loop mode decision.
  • the open-loop mode decision based upon parameters such as frame energy or frame periodicity, tells the coder to apply a mode with a bit rate of Rm, at which point the closed-loop mode decision takes over.
  • the closed-loop mode decision applies the Rm coding mode, tests performance, and maintains the Rm coding mode if performance is satisfactory.
  • the closed-loop mode decision tries the next, higher-rate coding mode, Rm-1. The process is reiterated until either a coding mode performs adequately or the highest-rate mode, Rl, is retained. Alternatively, the closed-loop mode decision applies the Rm coding mode, tests performance, and maintains the Rm coding mode if performance is satisfactory. Otherwise, the closed-loop mode decision tries the next, lower-rate coding mode, Rm+1. The process is reiterated until either a coding mode performs inadequately (at which time the last coding mode to perform adequately is applied), or the lowest-rate mode, RN, is retained.
  • multiple coding modes having bit rates R1,R2,...,RN (where R1>R2>...>RN) are employed. All of the coding modes are applied in parallel to the input speech frame, and the performances of the coding modes are compared with a set of N threshold performance measures. The coding mode that appears to produce the most accurate result is selected.
  • multiple coding modes having bit rates R1,R2,...,RN are employed. All of the coding modes are applied in parallel to the input speech frame, and the performances of the coding modes are compared with a set of N threshold performance measures. If several coding modes exceed the performance threshold target, the coding mode having the lowest bit rate (and also performing above the performance threshold) is selected.
  • multiple coding modes having bit rates Rl,R2,...,Quarter Rate,..., Half Rate,...,RN (where Rl is Full Rate and RN is Eighth Rate) are employed.
  • a closed-loop mode decision works in conjunction with an open-loop mode decision.
  • the open-loop mode decision based upon parameters such as frame energy or frame periodicity, tells the coder to apply the full-rate coding mode to unvoiced-to-voiced transition frames, voiced-to-voiced transition frames, nonstationary voiced segments, and nonstationary unvoiced segments. Also based upon frame parameters, the open-loop mode decision tells the coder to apply the half-rate coding mode to steady-voiced segments that exhibit a significant degree of similarity from frame to frame.
  • the open-loop mode decision tells the coder to apply the quarter-rate coding mode to steady unvoiced segments. Also based upon frame parameters, the open-loop mode decision tells the coder to apply the eighth-rate coding mode to background noise and other nonspeech signals such as silence.
  • the closed-loop mode decision takes over. The closed-loop mode decision applies the coding mode selected by the open-loop mode decision, tests performance, and maintains the selected coding mode if performance is satisfactory. Otherwise, the closed-loop mode decision tries the next, higher-rate coding mode. The process is reiterated until either a coding mode performs adequately or the full-rate mode is retained.
  • the closed-loop mode decision applies the coding mode selected by the open-loop mode decision, tests performance, and maintains the selected coding mode if performance is satisfactory. Otherwise, the closed-loop mode decision tries the next, lower-rate coding mode. The process is reiterated until either a coding mode performs inadequately (at which time the last coding mode to perform adequately is applied), or the lowest-rate mode is retained.
  • the MCCi and Mi coding modes each use the same source-coding mode (i.e., the same encoder and decoder).
  • the MCCi coding mode includes an additional layer of channel protection, in which (RCCi-Ri) bits are used for robust protection of the parameters of the Mi coding mode under the worst possible channel condition of the communication system.
  • the performance, or voice quality, delivered by the Mi coding mode under channel-error-free conditions is similar to the performance, or voice quality, delivered by the MCCi coding mode under the worst possible channel error condition.
  • the (RCCi-Ri) channel coding bits serve to provide adequate protection under the assumed, or target, worst channel condition.
  • the assumed worst channel condition may advantageously be, e.g., a predefined percentage of frame error rate (FER).
  • FER frame error rate
  • a closed-loop mode decision advantageously accounts for both channel variation and source variation to deliver a guaranteed quality of service. For example, a source-controlled, closed-loop mode decision such as described above is applied first. The closed-loop mode decision tells the coder to use the Mi coding mode.
  • MCCi,j-RCCi represents the minimum number of bits needed to add channel error protection to the channel coding layer so that the channel error protection will be adequate for the worst-case scenario in the j-th channel error condition.
  • Such a closed-loop, combined-network-and-source-controlled codec delivers guaranteed quality of service across various channel conditions while also delivering a low average bit rate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/US1999/026850 1998-11-13 1999-11-12 Closed-loop variable-rate multimode predictive speech coder WO2000030075A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020017006035A KR20010087393A (ko) 1998-11-13 1999-11-12 폐루프 가변-레이트 다중모드 예측 음성 코더
EP99957560A EP1129451A1 (en) 1998-11-13 1999-11-12 Closed-loop variable-rate multimode predictive speech coder
JP2000583004A JP2002530706A (ja) 1998-11-13 1999-11-12 閉ループ可変速度マルチモード予測スピーチコーダ
AU15243/00A AU1524300A (en) 1998-11-13 1999-11-12 Closed-loop variable-rate multimode predictive speech coder

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US19164398A 1998-11-13 1998-11-13
US09/191,643 1998-11-13

Publications (1)

Publication Number Publication Date
WO2000030075A1 true WO2000030075A1 (en) 2000-05-25

Family

ID=22706319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/026850 WO2000030075A1 (en) 1998-11-13 1999-11-12 Closed-loop variable-rate multimode predictive speech coder

Country Status (5)

Country Link
EP (1) EP1129451A1 (ko)
JP (1) JP2002530706A (ko)
KR (1) KR20010087393A (ko)
AU (1) AU1524300A (ko)
WO (1) WO2000030075A1 (ko)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001006490A1 (en) * 1999-07-19 2001-01-25 Qualcomm Incorporated Method and apparatus for maintaining a target bit rate in a speech coder
EP1235204A2 (en) * 2001-02-27 2002-08-28 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for selecting an excitation coding mode for speech coding
KR100804888B1 (ko) * 1999-10-28 2008-02-20 콸콤 인코포레이티드 프레임 에러에 대한 민감도를 감소시키기 위하여 코딩 방식선택 패턴을 사용하는 예측 음성 코더
EP1942490A1 (en) 2007-01-06 2008-07-09 Yamaha Corporation Waveform compressing apparatus, waveform decompressing apparatus, and method of producing compressed data
CN102254562A (zh) * 2011-06-29 2011-11-23 北京理工大学 一种相邻高低速率编码模式间切换的变速率音频编码方法
US8260609B2 (en) 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
US8532984B2 (en) 2006-07-31 2013-09-10 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of active frames
US8725499B2 (en) 2006-07-31 2014-05-13 Qualcomm Incorporated Systems, methods, and apparatus for signal change detection
CN118016081A (zh) * 2024-04-10 2024-05-10 山东省计算中心(国家超级计算济南中心) 基于语音质量分级模型的变速率语音编码方法及***

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0417739A2 (en) * 1989-09-11 1991-03-20 Fujitsu Limited Speech coding apparatus using multimode coding
US5761634A (en) * 1994-02-17 1998-06-02 Motorola, Inc. Method and apparatus for group encoding signals

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0417739A2 (en) * 1989-09-11 1991-03-20 Fujitsu Limited Speech coding apparatus using multimode coding
US5761634A (en) * 1994-02-17 1998-06-02 Motorola, Inc. Method and apparatus for group encoding signals

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CELLARIO L ET AL: "CELP CODING AT VARIABLE RATE", EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS AND RELATED TECHNOLOGIES,IT,AEI, MILANO, vol. 5, no. 5, 1 September 1994 (1994-09-01), pages 69 - 79, XP000470681, ISSN: 1120-3862 *
DAS A ET AL: "Multimode variable bit rate speech coding: an efficient paradigm for high-quality low-rate representation of speech signal", 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. ICASSP99 (CAT. NO.99CH36258), 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. PROCEEDINGS. ICASSP99, PHOENIX, AZ, USA, 15-19, 1999, Piscataway, NJ, USA, IEEE, USA, pages 2307 - 2310 vol.4, XP002131859, ISBN: 0-7803-5041-3 *
KLEIDER J E ET AL: "AN ADAPTIVE-RATE DIGITAL COMMUNICATION SYSTEM FOR SPEECH", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP),US,NEW YORK, IEEE, 1997, pages 1695 - 1698, XP000734984, ISBN: 0-8186-7920-4 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100754591B1 (ko) * 1999-07-19 2007-09-05 콸콤 인코포레이티드 음성 코더에서 목표 비트율을 유지하는 방법 및 장치
WO2001006490A1 (en) * 1999-07-19 2001-01-25 Qualcomm Incorporated Method and apparatus for maintaining a target bit rate in a speech coder
KR100804888B1 (ko) * 1999-10-28 2008-02-20 콸콤 인코포레이티드 프레임 에러에 대한 민감도를 감소시키기 위하여 코딩 방식선택 패턴을 사용하는 예측 음성 코더
EP1235204B1 (en) * 2001-02-27 2008-10-22 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for selecting an excitation coding mode for speech coding
EP1235204A2 (en) * 2001-02-27 2002-08-28 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for selecting an excitation coding mode for speech coding
EP1235204A3 (en) * 2001-02-27 2003-10-22 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for selecting an excitation coding mode for speech coding
US7130796B2 (en) 2001-02-27 2006-10-31 Mitsubishi Denki Kabushiki Kaisha Voice encoding method and apparatus of selecting an excitation mode from a plurality of excitation modes and encoding an input speech using the excitation mode selected
US8260609B2 (en) 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
US8532984B2 (en) 2006-07-31 2013-09-10 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of active frames
US8725499B2 (en) 2006-07-31 2014-05-13 Qualcomm Incorporated Systems, methods, and apparatus for signal change detection
EP1942490A1 (en) 2007-01-06 2008-07-09 Yamaha Corporation Waveform compressing apparatus, waveform decompressing apparatus, and method of producing compressed data
US8706506B2 (en) 2007-01-06 2014-04-22 Yamaha Corporation Waveform compressing apparatus, waveform decompressing apparatus, and method of producing compressed data
CN102254562A (zh) * 2011-06-29 2011-11-23 北京理工大学 一种相邻高低速率编码模式间切换的变速率音频编码方法
CN118016081A (zh) * 2024-04-10 2024-05-10 山东省计算中心(国家超级计算济南中心) 基于语音质量分级模型的变速率语音编码方法及***

Also Published As

Publication number Publication date
AU1524300A (en) 2000-06-05
KR20010087393A (ko) 2001-09-15
EP1129451A1 (en) 2001-09-05
JP2002530706A (ja) 2002-09-17

Similar Documents

Publication Publication Date Title
EP1340223B1 (en) Method and apparatus for robust speech classification
US7203638B2 (en) Method for interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs
JP5543405B2 (ja) フレームエラーに対する感度を低減する符号化体系パターンを使用する予測音声コーダ
EP1129450B1 (en) Low bit-rate coding of unvoiced segments of speech
KR100798668B1 (ko) 무성 음성의 코딩 방법 및 장치
EP1214705B1 (en) Method and apparatus for maintaining a target bit rate in a speech coder
KR20020081374A (ko) 폐루프 멀티모드 혼합영역 선형예측 (mdlp) 음성 코더
US20010051873A1 (en) Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation
EP1181687B1 (en) Multipulse interpolative coding of transition speech frames
US6434519B1 (en) Method and apparatus for identifying frequency bands to compute linear phase shifts between frame prototypes in a speech coder
JP2002536694A (ja) 音声コーダのための、1/8レート乱数発生のための方法と手段
EP1129451A1 (en) Closed-loop variable-rate multimode predictive speech coder

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2000 15243

Country of ref document: AU

Kind code of ref document: A

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1999957560

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020017006035

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2000 583004

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 99815069.X

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 1999957560

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1020017006035

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1999957560

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1020017006035

Country of ref document: KR