TW507194B - Variable-rate residual-transform vocoders using auditory perception approximation - Google Patents

Variable-rate residual-transform vocoders using auditory perception approximation Download PDF

Info

Publication number
TW507194B
TW507194B TW89110061A TW89110061A TW507194B TW 507194 B TW507194 B TW 507194B TW 89110061 A TW89110061 A TW 89110061A TW 89110061 A TW89110061 A TW 89110061A TW 507194 B TW507194 B TW 507194B
Authority
TW
Taiwan
Prior art keywords
dct
coefficients
rate
speech
gain
Prior art date
Application number
TW89110061A
Other languages
Chinese (zh)
Inventor
Jar-Ferr Yang
Rong-San Lin
Original Assignee
Nat Science Council
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nat Science Council filed Critical Nat Science Council
Priority to TW89110061A priority Critical patent/TW507194B/en
Application granted granted Critical
Publication of TW507194B publication Critical patent/TW507194B/en

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

In this invention, a simple variable rate vocoder for compressing the speech with low computational complexity is proposed by exploiting an adaptive pitch predictor, discrete cosine transform (DCT) on its residual signal, and perceptual representation of the DCT coefficients. To improve the speech quality, we further suggest pre-emphasized and de-emphasized filters respectively figured in the encoder and the decoder. The proposed speech coder, which achieves a reasonable quality but with much lower computational complexity than the traditional coded excited linear prediction (CELP) coders, can simultaneously offer multiple bit rates with various speech quality in the same coding structure. The real-time implementation comprising both encoder and decoder in a single low cost DSP chip confirms that the proposed vocoder produces an acceptable speech quality. With simple and variable-rate advantages, the proposed vocoder is suitable for providing digital speech communication over PSTN, internet, and wireless networks to achieve quality-of-service applications.

Description

507194 案號 89110061507194 Case number 89110061

五、發明說明(1) 發明背景 多媒Ϊ通訊之時代,隨__路之潮流及行動無線電 =if輸頻錢塞時,我們可機動降 ,二矸βσ Η αα質以低速率消除壅塞,以防止通訊的斷訊。 Ϊί的,國際動態影像專家群0™)組“訂 f 曰編解碼系統之重要性亦可獲得重視與發展。包 in4曰頻協定,目前所使用的低率1縮語音編解碼***,乃 6系演ί法均相當龐大’它雖然有5·3_ ί 6.3Kbps 一種位兀率,然編解碼壓縮位元率是卻無法改 用^核心結構也不是完全一樣,無法使用單; 標準之位元率是可調式,但整個架構之複雜度二 ^及1Ti G· 723·1系統。在多媒體通訊實際運用ί 器同_見在—顆DSP或CPU晶片上ίίίίί 習知技藝 多媒體通訊系統中,傳輸頻寬會因時間上隨使用 寡及遇的環境有很大的變化,因此驗進網際網路和益结$ 訊的發展,降低頻寬的使用有其必要性。從1975年 , 多編解瑪器可以在低位元率下產生高品質的語音,大部== I酬關 第8頁 2002.05.08. 〇〇8 507194 'SL· 1V. Description of the invention (1) Background of the invention In the era of multi-media communication, following the trend of __ roads and mobile radio = if the frequency is too high, we can move down. The two βσ Η αα quality can eliminate congestion at a low rate. To prevent the interruption of communication. In addition, the International Motion Picture Experts Group 0 ™) group "the importance of ordering codec systems can also be valued and developed. Including the frequency agreement, the low-rate 1-reduction speech codec system currently used, is 6 The systems are very large. Although it has a bit rate of 6.3Kbps, but the codec compression bit rate is not changed, the core structure is not exactly the same, and it cannot be used; the standard bit The rate is adjustable, but the complexity of the entire architecture and the 1Ti G · 723 · 1 system. The actual application of multimedia communication is the same as that seen on a DSP or CPU chip. In the multimedia communication system of the conventional technology, transmission Bandwidth will change greatly with time and environment. Therefore, it is necessary to check the development of the Internet and Internet and reduce the use of bandwidth. From 1975, many edits Maji can produce high-quality speech at a low bit rate, most of them == I pay off page 8.2002.05.08. 〇〇8 507194 'SL · 1

案號 89110061 五、發明說明(2) 丨說明(2) 〇〇4^ί .Α, 故|獅^^||纖纖I證f|J? 量量化,並利用閉迴路搜尋縣來得到激發訊號,本H ,巧用閉迴路搜尋需花費很大的計算量,即使是利 Γ,碼與解碼的程序。至於可變速率的應用在現今文獻中, ::mb編碼程序需要使用大小不哪度的碼本,所以計 : 却。而 Y· Shoham 於 1993 年 Proc· ICASSP-93 第 II-167-T69 ¥ 頻,轉換領域來廢縮語音,轉換編碼方式是屬於中6|的 i產f 的語音品質。其中離散餘弦轉換(D⑺不僅 的Σίΐϋ型,編解碼技術。也就是說藉著改變被編ί ^以輕易刪除一些較不重要的dct係 =Κ2ίί音品質,但是在=¾率Ϊ ; Αί4ίί:ί2ί朵對聲音的特性來f建訊號,會產生嚴 發明目標 統,新型的可變速率語音編解系 結構ίJ用ΐΐ5ί=3率改變時不須改變系統之 ———:-雜,紐龄舰鑣 第9頁 2002.05.08.009 507194 案號 89110ΠΒ1 五、發明說明(3) 碼。為了更進一步改善語音。暫, 及去強調兩漶波器,這兩個f波統編解碼端使用預強調 咐明柳於行動無輪、 發明之詳細說明 iSSii特碼依忘苑 碼ί5ίίί,度,本發明以DSP晶片及個人電腦上;現即時ί 明「一種聽覺特性用於殘值轉換可變速率語音編解碼裝置 一種使用聽覺特性之預強調濾波的於基週預測殘值轉換之」 率語音編碼及其相對應之解碼與去強調濾波裝置,該裝置 括: c 預強調滤波器尾·e(Z) ··將原始輸入語音信號5V(/7)處理成預 強調語音訊號S〇7);其濾波器之係數是利用前/個解碼的音 框語音訊號,經尸階的線性預估器(LPC)求得之户個係數,{α/, ΙΒΙ 第10頁 20〇2.05.08.010 507194 _案號 89110061Case No. 89110061 V. Explanation of the invention (2) 丨 Explanation (2) 〇〇4 ^ ί .Α, Therefore | Lion ^^ || Fiber optic I certificate f | J? Quantitative quantity, and use the closed loop search county to get the excitation signal This H, using closed-loop search cleverly requires a large amount of calculation, even if it is a program of coding and decoding. As for the application of variable rate in today's literature, the ::: mb encoding program needs to use codebooks of various sizes, so: In 1993, Y · Shoham Proc · ICASSP-93 II-167-T69 ¥ frequency, the field of conversion to shrink the speech, the conversion coding method belongs to the voice quality of i 6f. Among them, the discrete cosine transform (D⑺ is not only a Σίΐϋ type, encoding and decoding technology. That is, by changing it to edit ^ ^ to easily delete some less important dct system = Κ2ίί sound quality, but at = ¾ 率 Ϊ; Αί4ίί: ί2ί To build a signal based on the characteristics of the sound, a strict invention target system will be generated. A new variable rate speech coding system structure is used. Ί5ΐΐ = 3 does not need to change the system when the rate is changed. 9Page 9 2002.05.08.009 507194 Case No. 89110ΠB1 V. Description of the invention (3) code. In order to further improve the speech. For the time being, to emphasize the two wave filters, the two f-wave system codecs use pre-emphasis instructions. Liu Yu action without wheels, detailed description of the invention iSSii special code Yi forget garden code ί5ίί, the invention uses a DSP chip and a personal computer; now it is clear that "an auditory characteristic is used for residual value conversion variable rate speech codec A device that uses pre-emphasis filtering of auditory characteristics to convert residual values at the base period to predict the "ratio" speech coding and its corresponding decoding and de-emphasis filtering device. The device includes: c pre-emphasis filter tail · e (Z) · ·will The original input speech signal 5V (/ 7) is processed into a pre-emphasized speech signal S〇7); the filter coefficient is obtained by using the previous / decoded sound frame speech signal and obtained by the corpse-level linear predictor (LPC) Coefficients, {α /, ΙΒΙ, page 10 20〇2.05.08.010 507194 _ case number 89110061

Jt 曰 修正 五、發明說明(4) 1,2,…,户}組成,其中 H pre (Z): 1 * Σ α/6/ζ-/ —巧f 器:該稱^用適應性碼本(CGdebook)經 J 土平方差1求得到-週期參數觀,及增益參數為和广 器七二將十述預強語音信號5㈤減去上述週期參 乓益參數求付到一預估訊號?(4,而得到一殘值訊號 x\n); —?轉換器(DCT) ··接收上述減法器殘值訊號灿,以 «點取成一輸入音框,依時域重疊#,建立#點轉 換1框,再轉成#點的DCT係數及々];輸入音框時域(斤必 =刖取重疊#點以建立#點轉換音框時,一框窗函數 (Frame Window Function) «{y;] = i, M<n< ; ήη] - (2/7+1 )/2i( 0</7<#-i ; ^π] = [2(^/7)-1 ]/2i( ##</?< #-/ ; —係數挑選器:接收前述雜DCT係數制根據人 f ί特性及人崎頻率變動聽覺的容許誤差,DCT係數 f,將见點的係數在頻率域分成五段不同解析區域, 感f特性挑選f個係、數之絕對值組成-DCT係數向量,Jt. Amendment V. Invention Description (4) 1, 2, ..., household}, where H pre (Z): 1 * Σ α / 6 / ζ- /-Qf device: This title uses an adaptive codebook (CGdebook) Calculate the period parameter view by J soil square difference 1, and the gain parameter is the sum of the pre-strong speech signal 5㈤ minus the above-mentioned period reference parameter to obtain an estimated signal. (4, to get a residual value signal x \ n); —? Converter (DCT) ·· Receive the residual value signal of the above-mentioned subtractor Can, take «points as an input sound box, overlap # in time domain, create # points Convert 1 frame, and then convert it into #point DCT coefficients and 々]; input frame time domain (Kin must = 刖 take overlapping #points to create a #point conversion frame, a frame window function (Frame Window Function) «{ y;] = i, M < n <; ήη]-(2/7 + 1) / 2i (0 < / 7 <#-i; ^ π] = [2 (^ / 7) -1] / 2i ( ## < /? <#-/; —Coefficient Picker: Receives the aforementioned allowable error of the heterogeneous DCT coefficient system based on the characteristics of person f and the frequency variation of people, and the DCT coefficient f, the coefficient of the point of view in the frequency domain Divided into five different parsing regions, the f-feeling characteristics are selected from the absolute values of f systems and numbers to form the DCT coefficient vector.

Ra=te)k^:T係:^勒形成基率(BaSeRate)編碼以及高率(High 一匕器:將上述被挑選基率DCT係數向量,以加權向量 ί i HHUan^Zati〇n)求得最佳碼字索引(°°触㈤k 參^妙巧取佳,字量化增益(Q_tized c〇deb〇〇k Gain)P3 ’將上述被挑選高率之DCT係數,以已量化編碼基Ra = te) k ^: T series: BaSeRate coding and high rate (High dagger: the selected base rate DCT coefficient vector is calculated as a weighted vector, i HHUan ^ Zati〇n) Obtain the best codeword index (°° ㈤k ㈤ 参, 巧 wonderfully good, the word quantization gain (Q_tized codeb〇k Gain) P3 'The above-selected high-rate DCT coefficients are selected based on the quantized coding base.

第11頁 2002.05.08.011 X507194 年 升 平 59Page 11 2002.05.08.011 X507194 Year Flat 59

五、發明說明(5) 數為參考進行純量量化(Scalar Quantization);其中 DU係數挑器得在頻率域分成Α,β,c,D,E五段不同解析的 ’分別為A與B兩區域是250Hz,C區域是500Hz,D區 ^疋ΙΚΗζ ’ E區域是2KHz等頻寬。在基率(Basic Rate)編碼 ,,A,B,C,D四個區域内各選絕對值最大DCT值係數以增 号向量編碼之;在低率((L〇wRate)時,再增加編碼a、β兩區 域內各附加的一點DCT係數(即是於本編瑪架構上,另增加a 區域以聽覺遮閉效應為參考選擇個別附加編碼一點 係且以該區域内最大DCT值來正規化後,以3位元純 罝罝化在中率(Middle Rate)時,再增加編碼ς d兩區域 内各附加的一點DCT係數(即是於本編瑪架構上,另新增c, D兩區域以聽覺遮閉效應為參考選擇個別附加編碼一點係 數,且該區域内最大DCT值來正規化後,以3位元純量量 化),在尚率(HighRate)時,以中率為基礎增加編碼e區域内 的4點DCT係數(即是於本編瑪架構上,另新增在E 據聽覺的遮閉效應,以3位元淬取出4個DCT係數其位 碼),如表(一),其中位置索引以選擇所對應位置之DCT量 總和最大者,其大小以4位元來編碼其碼本索引,碼字大“、: 增益以3位元解析利用等機率純量量化來編碼:V. Description of the invention (5) The number is used as a reference for scalar quantization (Scalar Quantization); where the DU coefficient picker must be divided into five segments A, β, c, D, and E in the frequency domain. The area is 250 Hz, the C area is 500 Hz, and the D area ^ 疋 ΙΚΗζ 'E area is 2KHz and the like. In the basic rate coding, the maximum absolute DCT value coefficients in each of the four regions, A, B, C, and D, are coded with an increasing number vector; at a low rate ((L0wRate), the coding is increased. Each additional point of DCT coefficients in the two areas of a and β (that is, on the architecture of this edition, another area of a is added to select the individual additional encoding point system based on the auditory occlusion effect as a reference and is normalized with the largest DCT value in the area. When using 3 bits to purify in the Middle Rate, add an additional DCT coefficient in each of the two regions of coding d (that is, on the architecture of this edition, add two additional regions c and D to The auditory occlusion effect is used as a reference to select individual additional coding coefficients, and after the maximum DCT value in the area is normalized, it is quantized in 3 bits. At high rate, the coding is increased based on the medium rate. 4 points of DCT coefficients in the area (that is, based on the structure of this editor, and additionally added the occlusion effect based on the auditory hearing, and 4 bitcoin codes of 4 DCT coefficients are extracted by 3-bit quenching), as shown in Table (1), where The position index is to select the one with the largest sum of the DCT amount of the corresponding position, and its size is 4 bits. Which codebook index code, the codeword big ": The quantity of pure chance to gain bit depth using the encoded quantization and the like:

ssasassasaas 位置索引 位置 1 000 32,3M4,50 區域E 001 34,40,46, 52 010 36,42, 48, 54 011 39,45,51,57 L100 141,47,53,59 101 43,49,55,61 110 45,51,56,62 111 48,55,58,60 I 1·ΗΙ 第12頁 2002.05,03.012 507194 案號 89110061 取Η 6 年!月…曰 雜 五、發明說明(6) - 一一參數多工器··整合上述週期參數#及,、週期增益參數為和 々、DCT碼字索引淋、1€1量化增益&、以及因高速率所增送 ,DCT係數之量化等數據,形成一壓縮數據流,完成多率語音 壓縮; 一:參數解多工器:將上述壓縮數據流分離成週期參數· #及 ,、’期增益,數及和/、DCT碼字索引妙、#個DCT正負值、 DCT量化增益&、以及因高速率所增送之DCT係數之量化等數 據,開始語音解壓縮; 一離散餘弦轉換組成器:將上述參數解多工器所解出之DCT碼字 索引僻、//個DCT正負值、及DCT量化增益&,先執行#點ϊΐΐΐϋ建,再將增送之DCT係數之反量化,其 以零值之重建,完成水點DCT係數之重建; 一逆ϊί1餘弦轉換(IDCT)器,將上述重建之水點DCT係數韓換為 顯重建音框殘值訊號,並前雜與上_個重建音框 加,以重建(續點殘值制訊號; K鼠點相 一Ρό,估產生器,將上述參數解多工器所解出之 产以及週期增益參數及和/產生與編碼端完全相同之 Ϊ1;'是”:連/的適應性碼字。與c•,其相職:碼= 益P!,P2’U產生一預估語音向量&,其關係如下式: 〜=Piq+p2Cy•,其甲/ = ί·±1,且最佳基週參藪^ (5C/)2。 / C/C/ :反之/k =: /氺+1 袁佳基週參數y* = /*一 1如 (SC.f ^ (sc.) 2 9-,9-, ~9+Ι9+ι 9-ι9-ι 9+,9+1 必和<以最少平方誤求得,其中最佳爲•之 HBHil 第13頁 2002· 05.08.013 507194 修正 年 月來《曰 五、發明說明(7) 置化以等機率純量量化為炱。久•之量化則先以轉換,再 將〆純量量化為β。 一加法器:將上述重建音框殘值訊號;及預估訊號?(π)相加,以 重建預強语音信號+); 一去強調濾波器^5e(Z) ··將上述重建預強語音信號f(/〇處理成為最 後重建語音信號^⑷完成語音之解壓縮;· 兹再就本發明相關技術結構說明如下 (一)、編解碼器方塊圖 圖(一)和圖(二)是本發明所提出的聽覺特性用於殘值轉 變速率語音編碼器及解碼器的基本結構。本發明所提編碼 演算法是以一個音框(frame)少點DCT值為編碼核心,為了減 少DCT轉換時產生方塊效應(Bi〇cky Effect)造成之雜訊,本 發明在音框間有#點的語音取樣重疊,如圖(三)所示。為了降 低DCT轉換時產生之雜訊,本發明在編解碼端使用預強調1〇〇 和去強調濾、波器180處理,如圖(一)、(二)所示。原來的語音 經預強调濾波器處理再減去基週預估訊號得到殘值訊號,然後 將殘值訊號經離散餘弦轉換(DCT)llO轉換成DCT係數。g採 用人類心理音響學耳朵的特性,針對少點的該音框DCT值 擇較大可聽見的(Audible) DCT係數來量化編碼。在解瑪端由 反向量量化及逆DCT轉換(脆)130得麵殘值由 估訊號相加 260, 180疼理,付到#點合成語音,以上這些詳細描运如下: 圓 第14頁 2002· 05· 08· 014 507194 抓 5· 13. 曰 )多正 年 月 i 號—89110061 五、發明說明(8) (二)、二階適應性預估器 準,器,適應性預估器是使用最小平方誤差標 140 ,以ί二4析重來/方^樣應性碼本 丰立偷m重®方式儲存有128個碼字,這些瑪字是過 音訊號,碼本咖由〇 ·到127 n i基週延遲’㈣由20至147取樣,也就 蓉】的基週延遲,指標1對應到21的基週延遲 的,吾=基週的延遲近似岐,這假設是合理 ί用士 式對於非整數的基週延遲,本發明 … ^i=PlC/+P2Cy , for j=i±l ⑴ 主要連續㈣字,其姆應的碼字增益 約見在的預估訊號向量心是由二個連續碼字做 它的物理意義是這兩個碼字做内插。因此很 日t疋屬於主要的基週延遲,而指標』是第二的 土1乙遲,為了減化計算量,所以搜尋主要指 一階的預估標準,如下式: ^ J?aT^ 1 (2) %綠禁, 、一 / C/C/ ίϊίί強調音框訊號,與碼本裏的u 1¾ί 式⑵計算姻值,求出= if ㊇.-丨)yp /*+1 if cr-icr-\ Ci\icr+iς·+Ις·+Ι (3) ΙΒΙ 第15頁 2002· 05.08.015 507194 五、發明說明(9) ίίί!=ί5ί:ί?ί標和〗*,本發明 ||E||2 = IIS-plCr-P2Cyll2 —y2也以2邛% ,气· +2p,w ⑷ I著個別對A和A偏微分,且令等式為零得到下y5)、⑹兩 ^(β2||€,||2^0/+βι€ι,€,)=:〇, 5βι mi! 5β2 ^C_ = 2n3,IIC..II2U,r-广、Λ (5) σΡ2 J J · I j, ” ⑹ 最佳繼第二基週增益1·和Α·其表示如下 ^ ^ (SC.XIIC.-I^^CX^CSC,) 1 一 (7) = (8^)(||<^||2)-((^·)%·) 2 (8) =¾溫ί 士 故編碼如:-階預估器方式,然而第二i基4¾¾¾ 成〆f然後料機麵量量化,如此雜_值向量在二階預 估處理後變成如下式(為和/分別表示〆和〆經量化)·· X~s~^m -ppjCy. (9) (三)、預強調及去強調濾波器 音品質’在編碼端音框語音訊號先經預強 理在與適應性預估訊號相減後得见點殘值 藏再轉換成見點DCT係數,所以在解碼端逆DCT轉 顯議· 第16頁 2002.05.08.016 507194 案號 89110061 五、發明說明(10) 月 a 修正 巧音框雜喊_储週音她_加得到# 點曰框激發訊號再經去強調濾波器180處理得到合 下列兩式為濾波器之轉移函數:丨- Σssasassasaas position index position 1 000 32,3M4,50 area E 001 34,40,46, 52 010 36,42, 48,54 011 39,45,51,57 L100 141,47,53,59 101 43,49, 55,61 110 45,51,56,62 111 48,55,58,60 I 1 · ΗΙ Page 12 2002.05,03.012 507194 Case No. 89110061 Take 6 years! Month ... Miscellaneous V. Description of the Invention (6)- One-parameter multiplexer ·· Integrates the above-mentioned period parameters # and, the period gain parameters are 々, DCT codeword index, 1 € 1 quantization gain &, and quantization of DCT coefficients added due to high rate, etc. Data to form a compressed data stream to complete multi-rate speech compression; one: a parametric demultiplexer: separates the above compressed data stream into period parameters; # and,, 'period gain, number, and /, DCT codeword indexing , # DCT positive and negative values, DCT quantization gain &, and quantization of DCT coefficients added due to high rate, start speech decompression; a discrete cosine transform composer: the above parameter demultiplexer solves DCT codeword index is sparse, // DCT positive and negative values, and DCT quantization gain &, first perform #point build, and then send additional DCT The inverse quantization of the number, which reconstructs the DCT coefficients of the water point with a zero value reconstruction; an inverse ϊ1 cosine transform (IDCT) device, which replaces the reconstructed water point DCT coefficients with the signal of the residual value of the reconstructed sound box, and The front noise is added to the previous reconstruction sound box to reconstruct (continued point residual value signal; K mouse points are phased, the generator is estimated, the above parameters are solved by the multiplexer and the period gain parameters and / Produces exactly the same 编码 1; 'yes': the adaptive codeword of /. It is related to c •: its code = benefit P !, P2'U produces an estimated speech vector & its relationship The formula is as follows: ~ = Piq + p2Cy •, where A / = ί · ± 1, and the best base period is ^ (5C /) 2. / C / C /: Conversely / k =: / 氺 +1 Yuan Jiaji Zhou Parameter y * = / *-1 as (SC.f ^ (sc.) 2 9-, 9-, ~ 9 + Ι9 + ι 9-ι9-ι 9 +, 9 + 1 must be summed with least square error It is found that the best one is HBHil. Page 13 2002 · 05.08.013 507194 In the revised year and month, "Yue, Invention Description (7) The quantification of quantification is equal to the probability of a scalar quantification. For the long quantification, the first Transform, and then quantize the 〆 scalar to β. One adder: reconstruct the above Frame residual value signal; and the estimated signal? (Π) are added to reconstruct the pre-strengthened speech signal +); the filter is emphasized ^ 5e (Z) ·· The above-mentioned reconstructed pre-strengthened speech signal f (/ 〇 is processed into Finally, the speech signal is reconstructed to complete the decompression of the speech. The following describes the related technical structure of the present invention as follows (a). The codec block diagram (a) and (b) are the auditory characteristics proposed by the present invention. Basic structure of speech encoder and decoder for residual value transition rate. The coding algorithm proposed in the present invention is based on a frame with less DCT values as the coding core. In order to reduce the noise caused by the Biocky Effect during DCT conversion, the present invention has # The speech samples of the points overlap, as shown in Figure (3). In order to reduce the noise generated during the DCT conversion, the present invention uses pre-emphasis 100 and de-emphasis filter and wave filter 180 for processing at the codec side, as shown in (a) and (b). The original speech is processed by a pre-emphasis filter and then the base-period estimated signal is subtracted to obtain a residual value signal, and then the residual value signal is converted into DCT coefficients by discrete cosine transform (DCT) 110. g uses the characteristics of human psychoacoustics ears, and selects a larger audible (Audible) DCT coefficient for a small number of DCT values of the sound frame to quantize the encoding. At the solution end, the inverse vector quantization and inverse DCT conversion (brittleness) 130 are obtained. The residual value is added by the estimated signal 260, 180, and paid to # points of synthetic speech. The above details are described below: Round Page 14 2002 · 05 · 08 · 014 507194 Catch 5. · 13. Name) Multi-year i-89110061 V. Description of the invention (8) (2) Second-order adaptive estimator standard, the adaptive estimator is used Least square error standard 140, 128 codewords are stored in the format of 二 2 / recommendation codebook Fengli Stealth®. These characters are overtone signals, and the codebook is from 0 · to 127 ni base cycle delay '㈣ sampled from 20 to 147, that is, the base cycle delay, index 1 corresponds to the base cycle delay of 21, and I = the base cycle delay is approximately the same, this assumption is reasonable. For non-integer base cycle delays, the present invention ... ^ i = PlC / + P2Cy, for j = i ± l ⑴ Mainly continuous ㈣ characters whose codeword gain can be seen in the estimated signal vector center by two The physical meaning of a consecutive codeword is that the two codewords are interpolated. Therefore, the day t 疋 belongs to the main base cycle delay, and the index ′ is the second soil delay. In order to reduce the calculation amount, the search mainly refers to the first-order estimation standard, as follows: ^ J? AT ^ 1 (2)% green ban, 一 / C / C / ίϊίί emphasizes the sound frame signal, calculates the marriage value with u 1¾ in the codebook, and calculates = if ㊇.- 丨) yp / * + 1 if cr- icr- \ Ci \ icr + iς · + Ις · + Ι (3) ΙΒΙ Page 15 2002 · 05.08.015 507194 V. Description of the invention (9) ίίί! = ί5ί: ί? ί Sum *, the present invention | | E || 2 = IIS-plCr-P2Cyll2 —y2 is also 2 邛%, gas · + 2p, w ⑷ I is partially differentiating A and A individually, and let the equation be zero to get y5), ⑹2 ^ (β2 || €, || 2 ^ 0 / + βι € ι, €,) =: 〇, 5βι mi! 5β2 ^ C_ = 2n3, IIC..II2U, r- 广, Λ (5) σΡ2 JJ · I j, ”最佳 The best following the second base cycle gains 1 · and A · which are represented as follows ^ ^ (SC.XIIC.-I ^^ CX ^ CSC,) 1 one (7) = (8 ^) (|| < ^ || 2)-((^ ·)% ·) 2 (8) = ¾Wen, so the coding is as follows: -order estimator, but the second i-base is 4¾¾¾, and then the machine face quantity is quantified. , So the miscellaneous _ value vector becomes the following formula after the second-order estimation process (和 and 〆 are quantified for and / respectively) ... X ~ s ~ ^ m -ppjCy. (9) (3) Pre-emphasis and de-emphasis filter sound quality 'The voice signal at the encoding end sound box is pre-emphasized After subtracting from the adaptive prediction signal, the point residual value can be seen and then converted into the point DCT coefficient, so the inverse DCT conversion on the decoding side is significant. Page 16 2002.05.08.016 507194 Case No. 89110061 V. Description of the invention ( 10) Month a Corrected the clever frame yelling _ Chu Zhouyin she _ plus get # point frame excitation signal and then emphasize the filter 180 processing to obtain the transfer function of the following two formulas: 丨-Σ

Hde(z) aibi^ μ - Σ a/W’ /=1_ ρ ---- Σ (10) (11) * JLa ^ I'* ^ ^去強_、波㈣’}數之絲是利 :框,音訊號組合成點訊號經丨()階㈤们的义^解碼 ,由此可知紐㈣鮮響應躲隨著每2 ΐϊί!日訊號而不同,故可以濾除因量化或其他因素所I ν慮波器的加權因素5與办可依需要選定。由於 ϊΞί气器的參勢由解碼音框語音求得,所以本發明在‘ 焉知去強調濾波器的參數不需要由編碼端傳送。 (四)、聽覺特性量化DCT係數 vu私f 了改善合成的語音品質,本發明採用聽覺特性來詈 隼转^餘弦ϊί矣係數,因為DCT轉換具有最佳能量聚 ,、Γτγ 1且八計异量也比最佳KLT少而且已經有很多快速 - 雕瑕你5么得換 中DCT和IDCT之轉換公式如下 增4⑼!神。{Τ:Hde (z) aibi ^ μ-Σ a / W '/ = 1_ ρ ---- Σ (10) (11) * JLa ^ I' * ^ ^ Going strong_, wave ㈣ ') is a good thing: Frame, the audio signal is combined into a point signal and decoded by the meaning of 丨 (). It can be seen that the response of the new signal varies with every 2 2! Day signals, so it can be filtered out due to quantization or other factors. The weighting factor 5 of the wave filter can be selected as required. Since the parameters of the air compressor are obtained by decoding the sound of the voice frame, the parameters of the present invention to emphasize the filter need not be transmitted by the encoding end. (4) Quantification of auditory characteristics DCT coefficients vu and f improve the quality of synthesized speech. The present invention uses auditory characteristics to transform the ^ cosine coefficient, because DCT conversion has the best energy concentration, Γτγ 1 and eight different The amount is also less than the best KLT and there are already a lot of fast-you have to change the conversion formula of DCT and IDCT in the following 5! God. {Τ:

Jt=0 did: ί 1 fox 1/V2 forA: = 〇5 (12) (13) (14) 第17頁 2002.05.08.017 507194Jt = 0 did: ί 1 fox 1 / V2 forA: = 〇5 (12) (13) (14) Page 17 2002.05.08.017 507194

而尤〇]是殘值訊號的第/7個取樣樣本, DCT係數。 而是第々個 根據聽覺特性及DCT係數的統計特性。本發明建 ^效編碼技術來#化這些DCT係數,由於本發明必 部份編碼,一部份是編碼頻率索引,另一部份是編 相對的量化大小,S再說明如下: 1 ^ A、DCT係數頻率索引編碼 |先在頻率解析上低頻成份所支配的能量較高頻成份敏 夸,因此將著重於低頻DCT係數的量化及編碼使用較多位 馨 元,而南頻成份的DCT係數使用較少位元數。在轉換領域 能量分佈觀念低頻成份較高頻成份重要,在相同的時間, 人類耳朵聽覺特性,低頻範圍頻率偏移較靈敏,因此本發 明在不同頻率範圍有不同的頻率解析度來編碼DCT係數, 由於每一音框有N點的殘值訊號,因此殘值訊號的DCT係 數也是N點,本發明根據頻率索引將n點DCT係數分成5 段不同解析的區域,分割方式如圖(四)所示,在不同區域 的DC^係數將用不同位元數編碼。根據人類聽覺的遮閉效 應’,能量成份將遮閉其相鄰近的低能量成份,所以在每 個區域内本發明選擇能量最大的])CT係數來編碼,頻率變 動聽覺的容許誤差,是另一個重要的心理聽覺特徵,本發 鲁 明編碼DCT係數也將採用這重要特徵,直接的說人類聽覺 系統對低-頻成份的偏移變動較高頻成份靈敏。對於8KHz 的取樣頻率,N點DCT轉換區域對應4Khz的頻帶,因為低 頻帶的DCT係數對聽覺系統較靈敏較重要,所以被分割成 較小區域’分割情形如圖(四)所示,A和B兩區域是 250Hz,C區域是500Hz,D區域是ΙΚΗζ,E區域是2KHz , 本發明可以隨可用頻寬的大小及所需語音品質而選擇性 圓_1 第 18 頁 2002.05.08.018 507194And especially [0] is the seventh sample of the residual signal, the DCT coefficient. It is the first one based on the statistical characteristics of auditory characteristics and DCT coefficients. The present invention uses encoding technology to #quantize these DCT coefficients. Because the present invention must be partially encoded, one part is the encoding frequency index, and the other is the relative quantization size. S is further explained as follows: DCT coefficient frequency index coding | Firstly, the energy dominated by the low-frequency components in the frequency analysis is sensitive to the higher frequency components, so the quantization and coding of the low-frequency DCT coefficients will use more bits, and the DCT coefficients of the south frequency components will be used. Fewer bits. In the field of energy conversion, the concept of low-frequency components and higher-frequency components is important. At the same time, the human ear has a hearing characteristic and the low-frequency range is more sensitive to frequency offsets. Therefore, the present invention has different frequency resolutions to encode DCT coefficients in different frequency ranges. Since each sound box has N points of residual signals, the DCT coefficients of the residual signals are also N points. According to the frequency index, the present invention divides the n points of DCT coefficients into 5 different parsing regions. The division method is shown in Figure (4). It is shown that the DC ^ coefficients in different regions will be encoded with different numbers of bits. According to the occlusion effect of human hearing, the energy component will occlude its adjacent low-energy components, so the present invention selects the largest energy in each region]) CT coefficients to encode, the frequency tolerance auditory tolerance is another An important psychological auditory feature, this Luminus-encoded DCT coefficient will also use this important feature. It is directly said that the human auditory system is sensitive to the shift of low-frequency components and higher frequency components. For the sampling frequency of 8KHz, the N-point DCT conversion region corresponds to the 4Khz frequency band. Because the DCT coefficients in the low frequency band are more sensitive and important to the auditory system, they are divided into smaller regions. The division is shown in Figure (4). A and The two areas of B are 250Hz, the C area is 500Hz, the D area is IKΗζ, and the E area is 2KHz. The present invention can selectively circle according to the size of the available bandwidth and the required voice quality_1 page 18 2002.05.08.018 507194

^^ 89110061 五、發明說明(12) 的編碼這五段區域内的DCT係數,.例如5· 4kbps時只編碼 BC ’ D四個區域内的4點最大DCT值,當品質需求提 ,,首先在A,B兩區域内根據人類聽覺的遮蔽效應以該 區^内原來最大能量為參考,根據(15)式附加編碼一點 相對位置之DCT係數,來增加其合成語音品質,例如対2】是 奢士區亨内最大DCT係數,則所附加的另一點DCT係數拉 ,,$〇】,如此為6· 4 k位元率編碼。若所需語音品質還 J時則增加在C,D兩區域内以A,B兩區域相同編碼 方式各別附加編碼一點DCT係數如此為7.5k位元率。在 ;;=ps,r5 kbps為基礎,增加‘以 E區域内根據聽覺的遮蔽效應將N/2點 出群有4個DCT係數,由這8群挑選 引’所編碼的I^CT仿“;如表' 一 41: k^k-h2 k =: 0,1,4,5 k^k-2 k = 2,3,6,7 U>k^S I5>k>l2 23>k>\6 31>/t>24 k =^: + 4 ?=免 + 8 k 二]c 一% (15) B、DCT振幅的向量及適應性量化^^ 89110061 V. Description of the invention (12) Coding DCT coefficients in these five segments. For example, at 5.4 kbps, only the 4 maximum DCT values in the four regions of BC 'D are encoded. When quality requirements are raised, first, In the two areas A and B, according to the masking effect of human hearing, the original maximum energy in the area ^ is used as a reference, and the DCT coefficient of a relative position is additionally coded according to formula (15) to increase the quality of its synthesized speech. For example, 対 2] is The largest DCT coefficient in Henley in the luxury area is the additional DCT coefficient, $ 0], which is 6.4 k bit rate coding. If the required speech quality is still J, it is added in the C and D areas with the same coding method in the A and B areas. The DCT coefficient is 7.5k bit rate. Based on ;; = ps, r5 kbps, increase the 'N / 2 points out of the group in the E region based on the auditory shadowing effect to have 4 DCT coefficients. These 8 groups are selected to quote the' I ^ CT imitation coded ' ; As shown in Table 41: k ^ k-h2 k =: 0,1,4,5 k ^ k-2 k = 2,3,6,7 U > k ^ S I5 > k > l2 23 > k > \ 6 31 > / t > 24 k = ^: + 4? = Free + 8 k two) c one% (15) B, vector and adaptive quantization of DCT amplitude

iL5:Abf編碼技巧,根據前述原理被選到的頻率索 。ΪΡ係數振幅被編碼傳送到解碼端,•餘的A 的白_數使用加權量化其絕‘大了 己的加權因素,在轉換領域屬於‘ π素支配較大的Μ,所以它被加權較大,隨著鮮ί漸冗 用逆.DCT轉換130重建被 507194 案號 89110061 德5, ί3 J:_El 曰 #正 五、發明說明(13) 加而yT係數大小的加權逐漸的減少,為方便和效率,DCT係數 的振幅被表示成絕對值大小和帶正負符號,換句話倉卑'號 ί ί的編碼’令X[k]為DCT頻率指標k的 k的範圍為〇到N-1,例如,假設被編碼的頻率指標k為 向5被表示為坤1仰4】,其中尤是dct係數 f值大小:叫=Κ丨’〜是DCT大小嫣本的第k 個碼子,以加權向量量化,測量距離的表示式如下式, 4dk=Zwj(xj-p3mkj)2, (16) g是誤差標準加權因素,^ DCT碼字振幅的增益,加權因素 特性、表示最低頻率的加權因素,由最小的 ' ’ …一,今、,*、取m頭千的加: 距離,本發明能得到最相配的碼字尤•如下式 Max 其相對的最隹增益久被描述為:^下式』kj (17) Ρ3-Σ; X;iL5: Abf coding technique, the frequency cable selected according to the aforementioned principle. The ρ coefficient coefficient amplitude is encoded and transmitted to the decoder. • The white number of the remaining A is weighted to quantify it. It is an extremely large weighting factor. In the conversion field, it belongs to the larger π prime, so it is weighted larger. With the inverse gradual use of inverse. DCT transformation 130 reconstruction was 507194 Case No. 89110061 De 5, J3, ί3 J: _El said # Zheng V. Description of the invention (13) Plus the weight of the yT coefficient is gradually reduced, for convenience and Efficiency, the amplitude of the DCT coefficient is expressed as the magnitude of the absolute value and with positive and negative signs. In other words, the code of Cangbei 'No ί' lets X [k] be the range of k of the DCT frequency index k from 0 to N-1, For example, suppose that the encoded frequency index k is to be expressed as Kun 1 and Yang 4 to 5], in particular, the value of the dct coefficient f value: called = K 丨 '~ is the kth code of the DCT size copy, weighted by The expression of vector quantization and measurement distance is as follows: 4dk = Zwj (xj-p3mkj) 2, (16) g is the error standard weighting factor, ^ the gain of the DCT codeword amplitude, the characteristics of the weighting factor, the weighting factor indicating the lowest frequency, From the smallest '' ... one, this, *, take m plus thousands plus: distance, the present invention can get • matching codeword, especially as opposed to most of its type Max gain long-tailed bird is described as: ^ the following formula "kj (17) Ρ3-Σ; X;

Vj (18) i=CT4fii、f f f5為4 本以5個位元來編碼如,假設率正^的器=;2 _ 第20頁 2002.05.08.020 507194 案號 89110061 日 I Μ !务正 五、發明說明(14) 加的DCTX[0]振幅大小被正規化表示成,由於刷是 在Α區域内最大的DCT值所以將小於丨,因此卜[〇]丨很容易 藉由心=1^來表示,Ka利用3個位元做等機率純量量化,卜[〇ϊ 的重建被表示成丨耶卜炙•丨对勾,之表示&被量化,其位^配置 如表二。7· 5Kbps編碼技巧以6· 4Kbps編解碼器為基礎,但在 C,D兩區域内所附加的DCT係數其振幅編碼方式如同a , B兩 區域。9· 4kbps編碼方式是以7· 5kbps編碼器為基礎,新增β 區域^斤淬取出4個DCT係數組成維度為4的向量碼本,然後以 4位元來編碼其碼本索引,其碼字大小增益以3位元解析利 等機率純量量化來編碼。Vj (18) i = CT4fii, ff f5 is 4. This is coded with 5 bits. For example, assuming that the rate is positive ^ = 2 2 _ page 20 2002.05.08.020 507194 case number 89110061 day I MU! Explanation (14) The amplitude of the added DCTX [0] is normalized and expressed as, since the brush is the largest DCT value in the region A, it will be less than 丨, so [[]] 丨 is easy to express by heart = 1 ^ Ka uses three bits to do qualitative quantification with equal probability. The reconstruction of Bu [〇ϊ is represented as 丨 Yebu • 丨 checkmark, the expression & is quantified, and its bit configuration is shown in Table 2. The 7.5Kbps coding technique is based on the 6.4Kbps codec, but the DCT coefficients added in the C and D regions have the same amplitude coding method as the a and B regions. The 9 · 4kbps encoding method is based on a 7 · 5kbps encoder. A β area is newly added, and 4 DCT coefficients are taken out to form a vector codebook with a dimension of 4, and then its codebook index is encoded with 4 bits. The word size gain is coded with a 3-bit parsing probability and scalar quantization.

(五)即時(rea卜time)實現和客觀品質評估 t; πίΊ估合成語音品f及計算複雜,本發明將基率 =器及凡瑪器同時整合在同-顆TMS320C30執行轉王作,外目緣4.3_ϋ的基率編碼㈣複雜度僅需要ϋϋ ί三ί 5· 4Klys之複雜度評估,至於合成語音品質的評估 ,和每音框訊號與雜訊比值 ,率語音抑與傳統4._s fΐ二ίiif ί早系統演法計算量很小,由表四可 知在四種實_位元率都比⑽4.8H 第21頁 2002.05.09.021 507194 案號 89110061 年月曰 修正 五、發明說明(15) 變速率語音碼器在改變傳輸位元率時僅需要花費很小的 額外計算量,而不需要改變整個系統之架構,所以很適合 以低成本的DSP硬體晶片來實現。(5) Real-time (real-time) realization and objective quality evaluation t; π Ί estimates the synthetic speech quality f and the calculation is complicated. The present invention integrates the base rate = device and the Fanma device at the same time in the same TMS320C30 to perform the conversion, and 4.3. The basic rate coding of the headline ㈣ complexity only requires ϋϋ 三 ί 5. 4Klys complexity evaluation, as for the evaluation of the quality of synthesized speech, and the ratio of signal to noise per frame, the rate of speech suppression and traditional 4._s fΐ 二 ίiif ί Early system calculations have a small amount of calculations. From Table 4, it can be seen that the four real _bit rates are better than H4.8H. Page 21 2002.05.09.021 507194 Case No. 89110061 Rev. V. Invention Description (15 ) The variable-rate speech coder only needs a small amount of additional calculation when changing the transmission bit rate, and does not need to change the architecture of the entire system, so it is very suitable to be implemented with a low-cost DSP hardware chip.

II ίΒΒ 第22頁 2002.05.09.022 507194 91.5, 13 案號 89110061 曰 修正 圊式簡單說明 表列之說明: 表一區域E之DCT位置編碼表 表二 本發明之四段速率之位元配置表 表三本發明可變速率語音編碼器之計算複雜分析表 表四本發明可變速率語音編碼器之品質比較表 圖式說明 第一圖 第二圖 第三圖 第四圖 可變速率語音編碼器之構造示意圖。 可變速率語音解碼器之構造示意圖。 音框窗函數(Frame Window Function)示意圖。 64點DCT分成五段不同解析頻率區域之示意圖 圖號說明:. 100····預強調濾波器Hpre(z) ···減法器 離散餘弦轉換(DCT) DCT量化 逆離散餘弦轉換(IDCT) 基週預估增益扈、β 適應性碼本 殘餘訊號記憶體 語音記憶體 ,性預估係數分析 ·· ^強調濾波器Hde(Z) ••參數多工器 ••參數解多工 ••加法器 ...乘法器 第23頁 2002· 05· 09· 023II ίΒΒ Page 22, 2002.05.09.022 507194 91.5, 13 Case No. 89110061 Brief description of the correction method Table 1: Table 1 DCT position encoding table for area E Table 2 Bit allocation table for the four-segment rate of the present invention Table 3 The calculation complexity analysis table of the variable rate speech coder according to the present invention is shown in Table 4. The quality comparison table of the variable rate speech coder according to the present invention is illustrated in the first diagram, the second diagram, the third diagram, and the fourth diagram. schematic diagram. Schematic diagram of the structure of a variable rate speech decoder. Schematic diagram of frame window function. 64-point DCT is divided into five different analysis frequency regions. Figure number description: 100 ... Pre-emphasis filter Hpre (z) ... Subtractor discrete cosine transform (DCT) DCT quantized inverse discrete cosine transform (IDCT) Base period estimation gain 扈, β adaptive codebook residual signal memory speech memory, analysis of sexual estimation coefficients… ^ emphasis filter Hde (Z) • • parametric multiplexer • • parametric solution multiplexing • • addition Multiplier ... Page 23, 2002, 05, 09, 023

Claims (1)

------ 89110061_年 “ ,1#· 28 :以 一 八、申請專利範圍 I— ________________:」 L二&^音聽覺特性使用於殘值韓換可#速率語音編解礁器,該裝 置包括·· 一一預強調濾波器,將原始輸入語音信號5ν(/7)處理成預 強調語音訊號忒/〇 ; 一^二階適應性預估器,該預估器利用適應性碼本(C〇deb〇〇k)經 最小平方差,求得到一週期參數及,及增益參數為和々, 並利,了減法器’將上述預強語音信號〆々)滅去上述週期參 數及增益參數求得到一預估訊號?(„),而得到一殘值訊號 Χ/7); 一離散餘弦轉換器(DCT),接收上述減法器殘值訊號;ρ(Λ),以每 (#-點取成一輸入音框,依時域重疊#,建立见點轉換音框, 再轉成#點的DCT係數4左]; 一聽覺特性DCT係數挑選器,接收前述見點DCT係數4幻,根 據人類耳朵聽覺特性及人類對頻率變動聽覺的容許誤差,DCT 係數的統計特性,將见點的係數在頻率域分成五段不同解析區 ^,依耳朵感知特性挑選#個係數之絕對值組成一 DCT係數向 ^’1={尤,/2,_,勒形成基率(3肪6 1^1^)編碼以及高 率(High Rate)之 DCT 係數, 一DCT量化器,將上述被挑選基率]^係數向量,以加權向量量 t^ector Quantization)求得最佳碼字索引(c〇debook Index) 參數、弋及最佳碼字量化增益(Quantized Codebook GaiiOl, f 士述被挑選高率之DCT係數,以已量化編碼基率DCT係數為 參考進行純量量化(Scalar Quantization); 一了參數多工器,整合上述週期參數及产、週期增益參數為和 卢、DCT碼字索引胂、DCT量化增益&、以及因高速率所增送 f DCT係數之量化等數據,形成一壓縮數據流,完成多率語音 壓縮; —了參數解多工器,將上述壓縮數據流分離成週期參數/*及 期,!今數戎和々、DCT碼字索引必ic、#個DCT正負值、 DCT量化增益&、以及因高速率所增送之係數之量化等數 第24頁 2002· 06. 28.024 農1^110061 % 抱1^^ 修正 圍 ·»♦ —-**** 领允 - ...... ——— ππ殆首解靨縮; 舍ft弦轉換組成器,將上述參數解多工器所解出之DCT碼字 3 2淋、愈個DCT正負值、及DCT量化增益&,先執行I點dct ίίίί建,再將增送之DCT係數之反量化,其餘係數則以零 重建,完成见點DCT係數之重建; 餘弦轉換(IDCT)器,將上述重建之见點pcT係數轉換為 重建音框殘值訊號,並前#點與上一個重邊音框之後#點 ,以重建(KS〇點殘值制訊號; f階^估產生器,將上述參數解多工器所解出之週期參數# 及週期增益參數成和々產生與編碼端完全相同之預估 器’將上述重建音框殘值訊號外^及預估訊號?(w) 询以重建預強語音信號咖); 最波器#de(W ’將上述重建預強語音信號珈)處理成為 2. ^3|建^音信號%)完成語音之解壓縮。 所述之一種聲音聽覺特性使用於殊佶轉捧 強調;[波器H)之係數 係數,Γ曰框5°曰訊號,經户階的線性預估器(LPC)求得之户個 %,7 = I 2,···,/>}組成,其中: r z H pre (?) - Σ a ,a -Σ a/’厂, 駐一種―聽姻性使用於歹1^^ 應二其二階適應性預估器係以二個連續的適 音向量目對應的碼字增益Pi,卩2’以產生一預估語 』係如下式:,其中斥/±1,且最佳 土參數2木:^ Λ 、2 > 。最佳基週參數,=/*-1如 ϋ^ΙΙΙ 第25頁 20〇2.06.28. 025 :反之W = /*+1如(沉广一汐2 < ^SC±+}1 c> ,c.··, c7Tc7 > (sc,y ^ (sc,) A·和 飞·,(:·* 久以最少平方誤求得_ 1 聲音聽—雜肖於迦ft 5 — 5· 聲音聽覺特性傕用於趣麵 TT p 7 qif’其中DCT量化器之係數挑峨頻率域分 ϋ HD:E五段不同解析的區域,分別為兩區域是 250Hz,C區域疋500Hz,D區域是iKHz,E區域是2KHz等頻寬。 ίίϊίτ^數^ B,G,D四舰域内各選絕對值 敢大DCT係數值以增益向量編碼之;在低率“⑽Rate)時,再增 加編碼A, B兩區域内各附加的一點DCT係數;在中率(Middle如⑹ 時,再增加編碼C,D兩區域内各附加的一點dct係數;在高率 (HighRate)時’以中率為基礎增加編碼e區域内的4點j)ct係數 者丄 、 )· k申請專利範圍第丄項所遂jL一種聲音聽覺特性使用於綠佶轉槔 3Γ變迷率語音編解瑪器’其輸入音框時域(H^t;前取重叠會點以 建立#點轉換音框時,一框窗函數(FrameWind〇wFuncti()n) =1, Μ< η < Ν-ΜΛ ; »{/?] = (2/7+1)/2^ 0 ^ /7 ^ ; n{n]=: -\}/2Μ,么 η 幺.1。 7· t申請專利範圍第1項所述之一¥聲音聽覺特性使用於,檑轉梓 可雙速率語音編解瑪盈,& DCT量化器,篡向量1鳴 4維度的向量碼本,其中以5個位元來編瑪其碼字索引,碼字大 小的增益以4位元的解析利用等機率純量量化。 8.如申請專利簸圍第L項所述之一種聲音聽覺特性使用於祿補^ 可變速率語音編解瑪览,DCT量化器i係數,装彼 申請專利範圍第7項所述編碼架構上,另增加A區域和B區域以 聽覺遮閉效應為參考選擇個別附加編碼一點DCT係數,且以該區------ 89110061_year ", 1 # · 28: Take the eighteenth, apply for the scope of patent I— ________________:" L two & ^ audio auditory characteristics are used for the residual value Han Changke # rate speech editor The device includes a pre-emphasis filter that processes the original input voice signal 5ν (/ 7) into a pre-emphasis voice signal 忒 / 〇; a ^ second-order adaptive estimator, which uses an adaptive code This (Codeb00k) obtains a period parameter and the gain parameter through the least square difference, and the gain parameter is sum 并. For the benefit, the subtractor 'kills the above-mentioned pre-strong speech signal 〆々) to eliminate the above period parameter and Gain parameter to get an estimated signal? („) To get a residual value signal X / 7); a discrete cosine converter (DCT) receives the residual value signal of the above subtractor; ρ (Λ), takes an input frame every (# -point, according to Time domain overlap #, to establish the point conversion sound box, and then converted to #point DCT coefficient 4 left]; an auditory characteristic DCT coefficient picker, receives the aforementioned point DCT coefficient 4 magic, according to the human ear hearing characteristics and human frequency Variation of hearing tolerance, statistical characteristics of DCT coefficients, the coefficients of the points of view are divided into five different analysis areas in the frequency domain ^, and the absolute values of # coefficients are selected to form a DCT coefficient towards ^ '1 = {尤, / 2, _, form a base rate (3 fat 6 1 ^ 1 ^) encoding and high rate (High Rate) DCT coefficients, a DCT quantizer, the selected base rate above] ^ coefficient vector, weighted vector T ^ ector Quantization) to obtain the optimal codeword index parameter, 〇 and the optimal codeword quantization gain (Quantized Codebook GaiiOl, f). The DCT coefficients with a high rate are selected to use the quantized coding base. Rate DCT coefficient as a reference for scalar quantization (Scalar Quantization); The device integrates the above-mentioned period parameters, production and period gain parameters, such as the sum, DCT codeword index 胂, DCT quantization gain & and the quantization of the f DCT coefficient added due to the high rate to form a compressed data stream. Multi-rate speech compression;-A parametric demultiplexer that separates the above-mentioned compressed data stream into period parameters / * and period, now several Rong Heyu, DCT codeword index must be ic, # DCT positive and negative values, DCT quantization gain &, and the quantification of the coefficients given by the high rate, etc. Page 24 2002 · 06. 28.024 Agriculture 1 ^ 110061% Hold 1 ^^ Correction Wai · »♦ —- **** Lingyun-... ... ——— ππ 殆 first solution shrinking; truncated ft chord conversion composer, the DCT codewords solved by the above-mentioned parameter demultiplexer 32 2 leaching, more DCT positive and negative values, and DCT quantization gain & First, perform the I-point dct construction and then inverse quantize the additional DCT coefficients, and the remaining coefficients are reconstructed with zero to complete the reconstruction of the DCT coefficients. The cosine transform (IDCT) converter will reconstruct the pcT coefficients of the above-mentioned reconstruction points. Converted to the reconstructed sound box residual value signal, with # point before and # point after the previous heavy edge sound box. (KS0-point residual value signal; f-th order ^ estimation generator, the period parameter # and the period gain parameter solved by the above-mentioned parameter demultiplexer are summed to generate an estimator identical to the encoding end. ^ 3 above the reconstructed sound frame residual value signal and the estimated signal? (W) query to reconstruct the pre-strong speech signal); most waver #de (W 'processes the above-mentioned reconstructed pre-strong speech signal) to 2. ^ 3 | Build ^ tone signal%) Complete the decompression of the voice. One of the audio-visual characteristics described above is used for special emphasis; the coefficient coefficient of the [wave device H), the signal of Γ frame 5 °, is calculated by the household linear predictor (LPC)%, 7 = I 2, ···, / >}, where: rz H pre (?)-Σ a, a-Σ a / 'factory, which is a kind of “listening marriage” used in 歹 1 ^^ shall be The second-order adaptive estimator uses the codeword gain Pi corresponding to two consecutive adaptive vector vectors, 卩 2 'to generate an predictor. "The formula is as follows: where repulsion / ± 1, and the best soil parameter 2 Wood: ^ Λ, 2 >. The best base period parameter, = / *-1 as ϋ ^ ΙΙΙ page 25 202.0.28.025. 025: Conversely W = / * + 1 as (Shen Guang Yixi 2 < ^ SC ± +} 1 c > , c. ··, c7Tc7 > (sc, y ^ (sc,) A · He Fei ·, (: · * Long time squared to get the least square error _ 1 Voice Listening—Miscellaneous Xiao Yujia ft 5 — 5 · Voice Hearing characteristics: Used for the fun face TT p 7 qif ', where the coefficients of the DCT quantizer are outstanding in the frequency domain. HD: E is divided into five different analysis regions, which are 250Hz in two regions, 500Hz in C region, and iKHz in D region. , E area is 2KHz and other bandwidths. Ίίϊίτ ^ 数 ^ B, G, D The absolute value of each of the four ship domains is selected to dare to increase the DCT coefficient value and encode it with a gain vector; at a low rate "⑽Rate", add the code A, B An additional point of DCT coefficients in each of the two regions; in the middle rate (such as ⑹, the code C is added, and an additional point of dct coefficients in the two regions of D; in the case of HighRate), the code is increased based on the intermediate rate. 4 points in the e area j) ct coefficients 丄,) · k Application for the scope of patent application 丄 jL A sound auditory characteristic is used in the green 佶 turn 3Γ variable rate speech comprehension marquee when its input frame Domain (H ^ t When taking overlapping meeting points to create a #point conversion frame, a frame window function (FrameWind〇wFuncti () n) = 1, Μ < η <Ν-ΜΛ; »{/?] = (2/7 + 1 ) / 2 ^ 0 ^ / 7 ^; n {n] =:-\} / 2Μ, η 幺. 1. 7. · One of the items described in the first scope of the patent application ¥ The sound and hearing characteristics are used for Zi Ke's dual-rate speech coding solution Ma Ying, & DCT quantizer, the vector codebook with 4 dimensions and 4 vectors, in which 5 bits are used to index the Markov code word, and the gain of the code word size is 4 bits. The analysis is quantified with equal probability. 8. As described in item L of the patent application, one of the sound and hearing characteristics is used in Lubu ^ Variable Rate Speech Compilation Map, DCT quantizer i coefficient, and he applied for a patent In the coding architecture described in item 7 of the scope, additional areas A and B are added to select the additional additional coding DCT coefficients based on the auditory occlusion effect as a reference. 和7194 ___ 案號 89110061 六、申請專利範圍 域内最大DCT值來正規化後,以3位元純量量化丄 9·如申讒專利範圍第1項所述之一種聲音聽覺特性使用於 可變速虔語音編解瑪器,其中DCT量化器之係數,於中率編碼心 申請專利範圍第8項所述編碼架構上,另新增C,D兩區域以聽覺 遮閉效應為參考選擇個別附加編碼一點DCT係數,且以該區域内 最大DCT值來正規化後,以3位元純量量化丄 10.如專利範圍第1項所述之一種聲音聽玉特k使用於,佶轉 換可變速率語音編解碼器,其中DCT量化器之徭數,於高率時編 碼以申請專利範圍第9項所述編碼架構上,另新增在E區域内根 據聽覺的遮閉效應,以3位元淬取出4個DCT係數其位置之編碼 如表一: 表一 位置索引 位置 1 區域E _000 32, 38, 44, 50 | 001_I 34, 40,46, 52 010 36, 42, 48, 54 39,45,51,57 ~ϊδο^ ------- 41,47, 53, 59 101 43,49, 55, 61 —Τ;^— 45,51,56, 62 111 48, 55, 58, 60And 7194 ___ Case No. 89110061 6. After normalizing the maximum DCT value in the patent application domain, quantify it in 3 digits scalar 丄 9. One of the sound and hearing characteristics described in the first patent application scope is used for variable speed Speech encoder, in which the coefficients of the DCT quantizer are added to the encoding architecture described in item 8 of the scope of the patent application for medium-rate encoding, and C and D are added to select the additional encoding point based on the auditory occlusion effect. DCT coefficient, normalized by the largest DCT value in the area, and quantized in 3 bits scalar 丄 10. A sound as described in item 1 of the patent scope is used for 特 to convert variable rate speech The codec, in which the number of DCT quantizers, is encoded at the high rate based on the encoding framework described in the patent application No. 9, and is additionally added in the E area according to the auditory occlusion effect, and is extracted in 3 bits. The positions of the 4 DCT coefficients are coded as shown in Table 1: Table 1 Position Index Position 1 Area E _000 32, 38, 44, 50 | 001_I 34, 40, 46, 52 010 36, 42, 48, 54 39, 45, 51 , 57 ~ ϊδο ^ ------- 41,47, 53, 59 101 43,49, 55, 61 --Τ ^ - 45,51,56, 6211148, 55, 58, 60 第 27 頁 2002· 06.28· 027Page 27 2002.06.28.027
TW89110061A 2000-05-24 2000-05-24 Variable-rate residual-transform vocoders using auditory perception approximation TW507194B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW89110061A TW507194B (en) 2000-05-24 2000-05-24 Variable-rate residual-transform vocoders using auditory perception approximation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW89110061A TW507194B (en) 2000-05-24 2000-05-24 Variable-rate residual-transform vocoders using auditory perception approximation

Publications (1)

Publication Number Publication Date
TW507194B true TW507194B (en) 2002-10-21

Family

ID=27621650

Family Applications (1)

Application Number Title Priority Date Filing Date
TW89110061A TW507194B (en) 2000-05-24 2000-05-24 Variable-rate residual-transform vocoders using auditory perception approximation

Country Status (1)

Country Link
TW (1) TW507194B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI405185B (en) * 2007-12-13 2013-08-11 Qualcomm Inc Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures
TWI427621B (en) * 2004-11-30 2014-02-21 Agere Systems Inc Method, apparatus and machine-readable medium for encoding audio channels and decoding transmitted audio channels

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI427621B (en) * 2004-11-30 2014-02-21 Agere Systems Inc Method, apparatus and machine-readable medium for encoding audio channels and decoding transmitted audio channels
TWI405185B (en) * 2007-12-13 2013-08-11 Qualcomm Inc Fast algorithms for computation of 5-point dct-ii, dct-iv, and dst-iv, and architectures
US8631060B2 (en) 2007-12-13 2014-01-14 Qualcomm Incorporated Fast algorithms for computation of 5-point DCT-II, DCT-IV, and DST-IV, and architectures

Similar Documents

Publication Publication Date Title
JP5161212B2 (en) ITU-TG. Noise shaping device and method in multi-layer embedded codec capable of interoperating with 711 standard
TWI405187B (en) Scalable speech and audio encoder device, processor including the same, and method and machine-readable medium therefor
TWI407432B (en) Method, device, processor, and machine-readable medium for scalable speech and audio encoding
JP3391686B2 (en) Method and apparatus for decoding an encoded audio signal
TWI605448B (en) Apparatus for generating bandwidth extended signal
CN103187065B (en) The disposal route of voice data, device and system
EP2791937B1 (en) Generation of a high band extension of a bandwidth extended audio signal
JP2002533963A (en) Coded Improvement Characteristics for Performance Improvement of Coded Communication Signals
US20220180881A1 (en) Speech signal encoding and decoding methods and apparatuses, electronic device, and storage medium
TW200828268A (en) Dual-transform coding of audio signals
TW201009811A (en) Audio signal decoder, audio signal encoder, encoded multi-channel audio signal representation, methods and computer program
JP2003044097A (en) Method for encoding speech signal and music signal
JP2002541499A (en) CELP code conversion
JP2011504250A (en) Signal processing method and apparatus
CN101615396A (en) Audio coding equipment, audio decoding apparatus and method thereof
WO2015154397A1 (en) Noise signal processing and generation method, encoder/decoder and encoding/decoding system
CN101432802A (en) Method and apparatus for lossless encoding of a source signal, using a lossy encoded data stream and a lossless extension data stream
TW201218186A (en) Audio encoding device and audio decoding device
CN116075889A (en) Multi-channel signal generator, audio encoder and related methods depending on mixed noise signal
TW507194B (en) Variable-rate residual-transform vocoders using auditory perception approximation
Żernicki et al. Enhanced coding of high-frequency tonal components in MPEG-D USAC through joint application of ESBR and sinusoidal modeling
KR101387808B1 (en) Apparatus for high quality multiple audio object coding and decoding using residual coding with variable bitrate
CN101140758A (en) Perception weighting filtering wave method and perception weighting filter thererof
WO2004097798A1 (en) Speech decoder, speech decoding method, program, recording medium
JP5451603B2 (en) Digital audio signal encoding

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MK4A Expiration of patent term of an invention patent