TWI246256B

TWI246256B - Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation

Info

Publication number: TWI246256B
Application number: TW93120054A
Authority: TW
Inventors: Pao-Chi Chang; Tsung-Hung Wu
Original assignee: Univ Nat Central
Priority date: 2004-07-02
Filing date: 2004-07-02
Publication date: 2005-12-21
Also published as: TW200603553A

Abstract

An apparatus for audio compression by using mixed wavelet packet and discrete cosine transformation techniques is provided. The apparatus uses wavelet packet frequency division to divide the audio signals into a plurality of sub-frequency bands through a plurality of filters. Then, based on the flatness in the time domain and the frequency domain, the apparatus determines whether to execute the discrete cosine transformation. By using the optimal bit allocation algorithm of a non-ideal synthesized filter, the apparatus transforms the minimum shielding threshold of the frequency domain derived by the psychoacoustic model into a shielding threshold on the wavelet domain to provide accurate quantization guideline. Then, the apparatus uses a quantizer, in combination with the wavelet domain shielding threshold, to greatly reduce the data amount and maintain the high audio quality. Finally, the entropy coding is used to code the quantized parameters and packets them into bit-streams. Thus, a better wavelet shielding threshold can be obtained, and the higher time resolution or frequency resolution can be selectively obtained to match the different music characteristics to obtain higher music quality.

Description

1246256 玖、發明說明：【發明所屬之技術領域】本發明是㈣於-種具混合小㈣包與離換之音訊壓縮裝置，尤指—種可得到較以往小訊= 縮發明更佳之小波域遮蔽臨界值4可選擇性地得^ 南的時間解析度或頻率解析度，以配音，獲得更高的樂音品質。子〖之市【先前技術】隨著資訊時代的來臨，多媒體在網路上的及在PM、手機等可攜式裝置上的儲來到重視。其中，大料音訊也越來越叉久錄；《田μ比^ 4虎有者極兩的需求，在各種應用上白以向品質、低儲存容量、低複雜尤其在頻寬有限的無線環境及儲存又可、"’ 置上’更加突顯出音訊I縮的重要性。❾几式裝音訊包括語音和樂音，語音的壓縮大 1 古（LlnearPredi⑴⑽）的方法以建立發聲源的=預 f 5吾音中多餘的資料（Redundancy)，以達、次料量的目的。而樂音盥古五立減乂貝的頻率分佈範圍較廣：需。;較：：：特性’在頻域上的發聲模型不似語音之;:=;=樂音來達成壓缩M g μ & …法以曰源之線性預估烕錢的目的，所以通常是利用人耳 >、位元率並達到聽覺上^失真的效果。、來減在消f者對高品質樂音的極度需求下，各種不同的 1246256 數位編碼形式也就應運而生，其中最為廣泛使用的是CD (Compact Disk)形式，其取樣頻率為44. 1 KHz，位元率為1.4Mbps，如此高的傳輸位元率使得其應用層面受到極大的限制，針對於此，各種樂音編碼壓縮的方法相繼地被提出。以人耳聲學模型為基礎的演算法，可將高品質雙聲道的樂音壓縮至192K〜128Kbps或更低的位元率，在工業界的標準方面，國際單位標準局（International Organization for Standardization，簡稱ISO)與國際電子技術委員會（International Electrotechnical Commission，簡稱 IEC)在 1992 年制定了 MPEG-1 Audio 標準，其中分為三個層次，第一層複雜度最低，位元率在 128Kbps以上，第二層複雜度中等，位元率大約在128Kbps 左右，第三層複雜度最高，位元率可達64Kbps。1997更進一步制定了 MPEG-2 AAC (Advanced Audio Coding)音訊壓縮標準；由1998至今MPEG-4 AAC以MPEG-2 AAC為核心’仍持續不斷地加入更多有用的工具，提供更高效的壓縮成果。IS0/MPEG-1 audio以分頻編碼為基礎，把聲音頻帶分成頻寬相等的32個次頻帶，但是其主要的缺點在於其分頻時並沒有配合人耳聽覺的特性，頻帶的個數與頻寬跟所謂的關鍵頻帶（Critical Band)並不一致。而在MPEG-1 layer 3、MPEG-2 AAC、MPEG-4 AAC方面，則以 MDCT (Modified Discrete Cosine Transform)做 1246256 為其轉換核心，以獲取絕佳的頻率解析度，但也因此使得時間解析度十分匱乏。【發明内容】因此，本發明之主要目的係在於，可得到較以往小波音訊壓縮發明更佳之小波域遮蔽臨界值。本發明之另一目的係在於，可選擇性地得到較高的時間解析度或頻率解析度，以配合不同特性之樂音，獲得更南的樂音品質。為達上述之目的，本發明係一種具混合小波封包與離散餘弦轉換之音訊壓縮裝置，其係由一編碼器及一解碼器所組成，而該編碼器包含有：一小波封包模組 (Wavelet Packet )，係用以將音訊分頻；一選擇性離散餘弦轉換模組（SDCT)，係用以將次頻帶之訊號，轉換成頻譜線；一人耳聲學模型（ Psychoacoustic model )，係用以計算頻域遮蔽臨界值；一位元配置模組（Bit al location )，係用以將頻域遮蔽臨界值轉換成小波域遮蔽臨界值；一均勻量化器（Quantizer)，係用以將係數做量化；一燏編碼器（Entropy coder)，係用以將量化後的係數加以編碼；而解碼器包含：一熵解碼器（Entropy decoder) ’ 係用以將碼字對應回原本的係數值；一反量化器 (Dequant i zer )，係用以將量化後的係數進行反量化；一選擇性反離散餘弦轉換模組（IDCT)，係用以將頻譜線 1246256 轉換回次頻φ訊號；-反小波封包模組（！㈣以^ W謙^ Μ Pack⑷，係用以將次頻帶之訊號合成為時域音訊。如是可得到較以往小波音訊壓縮發明更佳之小波域遮蔽臨界值’且可選擇性地得到較高的時間解析度或頻率解析度’以配合不同特性之樂音，獲得更高的樂音品質。【實施方式】立請參閱『第1〜4圖』，係本發明之編碼系統方塊示思圖、本發明之解碼线方塊示㈣、本㈣所使用之小波封包架構示意圖、本發明之效能表現示意圖。如圖所^本發明係-種具混合小㈣包與離散餘弦轉換之音訊壓縮裝置，其係由一編碼器及一解碼器所組成；可得到較以往小波音訊壓縮發明更佳之小波域遮蔽臨界值，且可選擇性地得到較高的時間解析度或頻率解析度’以配合不同特性之樂I，獲得更高的樂音品質。上述所提之編碼器1係包括：立一小波封包模組1；ί (WaveletPacket)，係用以將曰訊分頻，而該小波封包模組工i係使用Daubechies小波作為訊號之分頻，且該小波封包模組i丄係 :訊分解成26個次頻帶，而其分頻方式如第3圖所：。 f某頻帶在進—步分頻後能得到較佳的效果，就令該頻帶分解成二個次頻帶；若分頻後並不能得到較好的妗果，則令該頻帶維持原狀。此一測試動作會從最上層 1246256 二：依序執仃元該層的每個次頻帶後，再進行下層的料，分解層數的上限為7層；再。丁下— u ! 2 (SDCT), -人須节之訊迖轉換成頻譜線，而付私..“今楛组1 2，係財轉換前後餘料点-w 十3二度ΐ別值，苴令正p 度1冽值之計算方式如下·· /、宁，垤1246256 Description of the invention: [Technical field to which the invention belongs] The present invention is a kind of audio compression device with a mixed small bag and exchange, especially a wavelet domain that can obtain better than the previous small message = shrink invention The masking threshold value 4 can selectively obtain a time resolution or a frequency resolution of ^ South for dubbing to obtain higher musical quality. [Article] [Previous Technology] With the advent of the information age, the storage of multimedia on the Internet and on portable devices such as PM and mobile phones has come to the fore. Among them, the audio of the big data is also becoming more and more long-term recorded; "Tian μ ratio ^ 4 tigers have extremely different needs, in various applications to achieve quality, low storage capacity, low complexity, especially in wireless environments with limited bandwidth And storage, "quote" puts more emphasis on the importance of audio compression. ❾ Several styles of audio include voice and music, and the compression of the voice is large (LlnearPredi⑴⑽) to build the sound source = pref 5 redundant data (redundancy), in order to achieve the purpose of volume. However, the frequency distribution of Leyin Shuiguwuli Jianbei is relatively wide: required. ; Compare ::: Characteristic's vocalization model in the frequency domain is not similar to that of speech;: =; = music to achieve compression M g μ &… method to save money by linear estimation of source, so it is usually used Human >, bit rate and achieve the effect of hearing distortion. In order to reduce the extreme needs of consumers for high-quality music, a variety of different 1246256 digital encoding formats have emerged, the most widely used is the CD (Compact Disk) format, with a sampling frequency of 44.1 KHz The bit rate is 1.4 Mbps. Such a high transmission bit rate makes its application level greatly limited. In view of this, various methods of music encoding and compression have been successively proposed. The human ear acoustic model-based algorithm can compress high-quality two-channel music to a bit rate of 192K ~ 128Kbps or lower. In terms of industrial standards, the International Organization for Standardization (International Organization for Standardization, (ISO for short) and the International Electrotechnical Commission (IEC) formulated the MPEG-1 Audio standard in 1992, which is divided into three levels, the first layer has the lowest complexity, the bit rate is above 128Kbps, and the second The layer complexity is medium, the bit rate is about 128Kbps, the third layer has the highest complexity, and the bit rate can reach 64Kbps. 1997 further developed the MPEG-2 AAC (Advanced Audio Coding) audio compression standard; from 1998 to the present, MPEG-4 AAC with MPEG-2 AAC as its core has continued to add more useful tools to provide more efficient compression results . IS0 / MPEG-1 audio is based on frequency division coding and divides the sound frequency band into 32 sub-bands with equal bandwidth. However, its main disadvantage is that its frequency division does not match the characteristics of human hearing. The bandwidth is not consistent with the so-called critical band. In terms of MPEG-1 layer 3, MPEG-2 AAC, and MPEG-4 AAC, MDCT (Modified Discrete Cosine Transform) is used as the conversion core of 1246256 to obtain excellent frequency resolution, but it also makes time analysis Degree is very scarce. [Summary of the Invention] Therefore, the main object of the present invention is to obtain a wavelet domain masking threshold value which is better than the conventional wavelet audio compression invention. Another object of the present invention is to selectively obtain a higher time resolution or frequency resolution to match musical tones with different characteristics to obtain more southern musical tones. In order to achieve the above object, the present invention is an audio compression device with mixed wavelet packet and discrete cosine transform, which is composed of an encoder and a decoder, and the encoder includes: a wavelet packet module (Wavelet Packet) is used to divide the audio frequency; a selective discrete cosine transform module (SDCT) is used to convert the sub-band signals into spectral lines; a human ear acoustic model (psychoacoustic model) is used to calculate Frequency domain masking threshold; a bit configuration module (Bit al location) is used to convert the frequency domain masking threshold into a wavelet domain masking threshold; a uniform quantizer (Quantizer) is used to quantize the coefficients ; Entropy coder, which is used to encode the quantized coefficients; and the decoder includes: an entropy decoder (Entropy decoder), which is used to map the codeword back to the original coefficient value; A quantizer (Dequantizer) is used to inverse quantize the quantized coefficients; a selective inverse discrete cosine transform module (IDCT) is used to convert the spectrum line 1246256 back to the secondary frequency φ signal No .;-Anti-Wavelet Packet Module (! ㈣ ^ W 谦 ^ Μ Pack⑷, which is used to synthesize the signals in the sub-band into time-domain audio. If this is the case, a wavelet-domain masking threshold that is better than previous wavelet audio compression inventions can be obtained 'Alternatively, a higher time resolution or frequency resolution can be obtained' in order to match music with different characteristics to obtain higher music quality. [Embodiment] Please refer to [Figures 1 to 4], which is the present invention The block diagram of the encoding system, the block diagram of the decoding line of the present invention, the schematic diagram of the wavelet packet architecture used by the present invention, and the performance performance of the present invention. As shown in the figure ^ The present invention is a hybrid packet with discrete Cosine-transformed audio compression device, which is composed of an encoder and a decoder; it can obtain the wavelet domain masking threshold that is better than the previous wavelet audio compression invention, and can optionally obtain a higher time resolution or 'Frequency resolution' to match music I with different characteristics to obtain higher musical sound quality. The encoder 1 mentioned above includes: a wavelet packet module 1; (WaveletPacket), used to Divide the frequency of the signal, and the wavelet packet module i uses the Daubechies wavelet as the signal frequency division, and the wavelet packet module i: the signal is decomposed into 26 sub-bands, and the frequency division method is as the third Figure: f. After a certain frequency band can achieve better results after frequency division, the frequency band is decomposed into two sub-frequency bands; if a better result is not obtained after frequency division, the frequency band is maintained As it is. This test action will start from the top layer 1246256 II: After performing each sub-band of this layer in order, the next layer will be processed, and the upper limit of the number of decomposition layers is 7 layers. Again. Dingxia-u! 2 (SDCT),-The signal of the festival of people must be converted into a spectrum line, and paid privately .. "The current group 1 2 is the remaining material point before and after the conversion of the wealth -w 13 2 degrees, and the order is positive p degrees The calculation method of the value of 1 is as follows: / / Ning, 垤

Fl (fH' rlainess_\ieasure{x)^ J A -l γΣ^)Fl (fH 'rlainess_ \ ieasure (x) ^ J A -l γΣ ^)

11 m.Q 並選擇所需之信號進行離散餘弦轉換’若職散餘㈣換叙平域度量測值較差時，心要進行離散餘弘轉換·· 一玟選擇性趁散餘弦轉換权组2 2 (sdct)之轉桉公式如下·· 、“11 mQ and select the required signal to perform discrete cosine transformation. 'If the post cosine conversion is not good, the discrete cosine conversion is necessary. · A group of selective cosine transformation weights is selected 2 2 (sdct) The conversion formula is as follows ...

你)=雄)&⑻ eos^L^， n=0 2iV w(/：)=You) = Male) & ⑻ eos ^ L ^, n = 0 2iV w (/:) =

^ = 0,1,...,^-1 k = Q \<k<N-\ 一人耳聲學模型 1 3 ( Psychoacous t i c mode 1 )，係用以計算頻域遮蔽臨界值，而該人耳聲學模型1 3，係 1246256 利用不可預測性及延展函數，計算出頻域最大遮蔽臨界值；一位元配置模組丄4 (Bit allocati〇n)，係用以將頻域遮蔽臨界值轉換成小波域遮蔽臨界值，而該位元配置模組1 4係根據頻域之遮蔽臨界值，以及各個次頻帶之重建濾波器的頻率響應，可將頻域之遮蔽臨界值轉換成小波域遮蔽臨界值； ^ 一均勻量化器1 5 (Quantizer),係用以將係數做量化，而該均勻量化器1 5，係依據小波域遮蔽臨界值，調整其量化步階’並針對小波係數進行量化； ,:熵編碼器1 6 ( Entropy coder )，係用以將量化後的係數加以編碼，而該熵編碼器i 6，係根據係數出現的機率’將出現機率較大的數值，指^位元數較短之碼字加以代表；該解碼器2係包括·· 一熵解碼器2 1 ( Entropy decoder),係用以將碼 :對ί回原本的係數值，而該熵解碼^ 2 1，係根據熵 =。。1 6編碼時所採用的碼薄，找出碼字所對應的係。且_解碼器2 i係將資料分割成位元平面（阶 Γ 並使用一位兀算術編碼（Binary Arithmetic 像壓丄依位儿的重要性’循序編碼。此編碼方式在影切i /不但擁有極佳㈣縮效果，同時也具有嵌入式、·扁碼（Embeddedcoding)的特性，可依據位元率的要 1246256 求，將超出位元率的資解碼端依然可順利解出除僅傳㈣端的資料，在 ^ 出’亚且擁有相當優_沾旦貝。而該熵解碼器2 1亦可僅採 ’的像品出現機率”頻帶内’各個量化後的係數值之出現機羊’用以建立算術編值之編碼時，每個音框都㈣要的付錢率表。在表做算術編碼，並在备個立4 〜機率母個e框編完後都會做收尾動作，用以稍微控制錯誤蔓延。一反量化器2 2 (DeqUantizer)，係用以將量化後籲的係數進行反量化，而該反量化器2 2，係將量化後的係數乘上量化步階，獲得反量化後之係數；一選擇性反離散餘弦轉換模組2 3 ( IDCT )，係用以將頻譜線轉換回次頻帶訊號，而該選擇性反離散餘弦轉換模組2 3，係根據位元流所記錄的資訊，在需要時執行反離散餘弦轉換，將頻譜線轉換回次頻帶係數；而選擇性反離散餘弦轉換模組2 3 ( IDCT )之轉換$ 公式如下：^ = 0,1, ..., ^-1 k = Q \ < k < N- \ A person's ear acoustic model 1 3 (Psychoacous tic mode 1) is used to calculate the threshold value of the frequency domain occlusion, and the person Otoacoustic model 13, system 1246256 uses the unpredictability and extension function to calculate the maximum masking threshold in the frequency domain; a bit configuration module 丄 4 (Bit allocati〇n) is used to convert the frequency domain masking threshold Into the wavelet domain masking threshold, and the bit allocation module 14 converts the frequency domain masking threshold into the wavelet domain masking according to the masking threshold in the frequency domain and the frequency response of the reconstruction filter in each sub-band. Threshold value; ^ A uniform quantizer 15 (Quantizer) is used to quantize the coefficients, and the uniform quantizer 15 is used to adjust the quantization step according to the wavelet domain masking threshold and quantize the wavelet coefficients. ; :: Entropy coder 1 6 (Entropy coder) is used to encode the quantized coefficients, and the entropy encoder i 6 is based on the probability that the coefficients appear 'will have a higher probability value, referring to ^ bit Shorter codewords are represented; the decoder 2 includes ... An entropy decoder 2 1 (Entropy decoder) is used to convert the code to the original coefficient value, and the entropy decoder ^ 2 1 is based on entropy =. . 16 The codebook used in encoding, and find the system corresponding to the codeword. And _decoder 2 i is to divide the data into bit planes (order Γ and use one-bit arithmetic coding (Binary Arithmetic) like sequential coding. This coding method is used in shadow cutting i / not only has Excellent curling effect, and also has the characteristics of embedded, flat coding (Embedded coding), can be based on the bit rate requirements of 1246256, will exceed the bit rate of the decoding end can still be successfully resolved Data, ^ out of 'Asia and has a very good _ Zhan Danbei. The entropy decoder 21 can also only use the' probability of image appearance 'in the frequency band' appearance of each quantized coefficient value ' When setting up the coding of arithmetic coding, each sound box is required to pay the rate table. Do arithmetic coding in the table, and after finishing the preparation of 4 ~ probability mother e box, the closing action will be performed to slightly Control the spread of errors. A dequantizer 2 2 (DeqUantizer) is used to dequantize the quantized coefficients, and the dequantizer 2 2 is used to multiply the quantized coefficients by the quantization step to obtain the inverse quantization. After coefficient; a selective inverse discrete cosine Conversion module 2 3 (IDCT) is used to convert the spectrum line back to the sub-band signal, and the selective inverse discrete cosine conversion module 2 3 is based on the information recorded by the bit stream to perform inverse dispersion when needed. Cosine transformation converts the spectral lines back to sub-band coefficients; and the conversion of the selective inverse discrete cosine transformation module 2 3 (IDCT) is as follows:

一反小波封包模組2 /7 = 0,1，·"，Λ’ - 1Inverse wavelet packet module 2/7 = 0,1, ", Λ ’-1

n = 0 1<η<\ -I 4 ( Inverse Wavelet 1246256n = 0 1 < η < \ -I 4 (Inverse Wavelet 1246256

Packet) ’係用以將次頻帶之訊號合成為時域音訊，而該反小㈣包模組2 4，係將次頻帶之訊號經由小波合成濾波為’合成為時域音訊訊號。在指定Vanishing Moments N並要求符合Minimum Support的條件下，—共會有，個小波可供選擇。當調適性小波轉換核心啟動時，每個音框會在所有的候選小波係數中，選擇可使式（4.9)有最小值的那組小波係數，來做為此音框的小波轉換核^。在此次實驗中，採用固定式小波封包，且不使用DCT，分別針對⑽也叩·Packet) 'is used to synthesize the sub-band signal into time-domain audio, and the anti-small packet module 2 4 is used to filter the sub-band signal into a time-domain audio signal through wavelet synthesis filtering. Under the condition that Vanishing Moments N is specified and minimum support is required, there will be a total of wavelets to choose from. When the adaptive wavelet transform core is activated, each frame will select the set of wavelet coefficients that will have a minimum value in equation (4.9) among all the candidate wavelet coefficients as the wavelet transform kernel for this frame ^. In this experiment, a fixed wavelet packet was used, and no DCT was used.

Moments ^ 8 M2 > 20 ^ Daubechies Co.pactly Support 小波，測試其固定式小波核心與調適性小波核心的壓縮效能表現。實驗結果顯示如第4圖所示，其中除本發明之外還包含了 MCLC、AACHE&Mp3的壓縮效果，以供比對。實驗結果發現’適應性小波核心確實對音質有所助益。值得注意的是，適應性的複雜度隨著小波係數的長度增長，會成指數上升。舉例來說，在VanishingMQments · ，20的情況下’候選小波已高達蘭組，也就是說運算複雜度提升了上千倍’但是改進卻十分有限。與其使用調適性小波’倒不如直接採用擁有更多 Moments的小波，對於效能的改進更直接、有效，其所帶來的複雜度負擔也只是成線性成長，因此最後本系統只採用固定式的小波核心轉換。 12 1246256 綜上所述，本發明具混合小波封包與離散餘弦轉換之音訊壓縮裝置，係混合小波與離散餘弦轉換之音訊壓縮發明，以小波封包分頻方式，將樂音訊號經由濾波器群組分成26個次頻帶，再根據時域與頻域之平坦程度，決定是否要進一步執行離散餘弦轉換。本發明並採用非理想合成濾波器之最佳位元配置演算法，將人耳聲學模型所得出的頻域最小遮蔽臨界值，轉換成小波域上的遮蔽臨界值，以提供精良的量化準則。其後以均勻量化器配合小波域的遮蔽臨界值，大幅降低資料量並仍保有極高的音質，最後再以算術編碼將量化後的係數做進一步的熵編碼並封裝成位元流。本發明在使用 van i sh i ng moments 為 12 之 Daubechies compactly support 小波時，其壓縮效能表現如第4圖所示，其中除本發明之外還包含了 AAC LC 、AAC HE及MP3的壓縮效果，以供比對。結果顯示，本發明僅需52 kbps即可達到MP3 64 kbps 的音質；另外，在同樣64 kbps之位元率下，本發明所提供的音質不但優於MP3、AAC低複雜度規格，更可超越 AAC高效率規格。本發明之産生能更進步、更實用、更符合使用者之所須，確已符合發明專利申請之要件，爰依法提出專利申請，尚請貴審查委員撥冗細審，並盼早曰准予專利以勵發明，實感德便。惟以上所述者，僅為本發明之較佳實施例而已，當不能以此限定本發明實施之範圍；故，凡依本發明申請 13 1246256 專利範圍及發明說明書内容所作之簡單的等效變化與修飾’皆應仍屬本發明專利涵蓋之範圍内。【圖式簡單說明】第1圖’係本發明之編碼系統方塊示意圖。第2圖’係本發明之解碼系統方塊示意圖。第3圖’係本發明所使用之小波封包架構示意圖。第4圖’係本發明之效能表現示意圖。【元件標號對照】編碼器1 小波封包模組1 1 選擇性離散餘弦轉換模組12 人耳聲學模型13 位元配置模組1 4 均勻量化器1 5 熵編碼器1 6 解碼器2 熵解碼器2 1 反量化器2 2 選擇性反離散餘弦轉換模組2 3 反小波封包模組2 4Moments ^ 8 M2 > 20 ^ Daubechies Co.pactly Support wavelet, test the compression performance of its fixed wavelet core and adaptive wavelet core. The experimental results show that as shown in Figure 4, in addition to the present invention, the compression effects of MCLC, AACHE & Mp3 are also included for comparison. The experimental results found that the core of adaptive wavelet really does help the sound quality. It is worth noting that the complexity of adaptability increases exponentially as the length of the wavelet coefficient increases. For example, in the case of VanishingMQments, 20, the 'candidate wavelet has reached the blue group, that is, the computational complexity has increased by a thousand times' but the improvement is very limited. Instead of using adaptive wavelets, it is better to directly use wavelets with more Moments. The improvement of performance is more direct and effective, and the complexity burden brought by it only grows linearly. Therefore, this system only uses fixed wavelets. Core conversion. 12 1246256 In summary, the present invention has an audio compression device with mixed wavelet packet and discrete cosine transform, which is an invention of audio compression with mixed wavelet and discrete cosine transform. The wavelet packet frequency division method is used to divide the music signal through the filter group 26 sub-bands, and then decide whether to further perform discrete cosine conversion based on the flatness of the time and frequency domains. The present invention also uses the optimal bit allocation algorithm of the non-ideal synthesis filter to convert the minimum masking threshold in the frequency domain obtained from the acoustic model of the human ear into the masking threshold in the wavelet domain to provide sophisticated quantization criteria. After that, a uniform quantizer was used in conjunction with the masking threshold in the wavelet domain to greatly reduce the amount of data and still maintain a very high sound quality. Finally, the quantized coefficients were further entropy coded by arithmetic coding and encapsulated into a bit stream. When the present invention uses a Daubechies compactly support wavelet with van i sh i ng moments of 12, its compression performance is shown in Figure 4, which includes the compression effects of AAC LC, AAC HE and MP3 in addition to the present invention. For comparison. The results show that the present invention only needs 52 kbps to achieve MP3 64 kbps sound quality. In addition, at the same bit rate of 64 kbps, the sound quality provided by the present invention is not only better than MP3 and AAC low complexity specifications, but also beyond AAC high efficiency specifications. The invention of this invention can be more advanced, more practical, and more in line with the needs of users. It has indeed met the requirements for patent applications for inventions. The patent application has been submitted in accordance with the law. Encouraging invention, real sense of virtue. However, the above are only the preferred embodiments of the present invention, and the scope of implementation of the present invention cannot be limited by this; therefore, any simple equivalent changes made in accordance with the scope of the present application 13 1246256 patent and the content of the invention specification And modification 'should still fall within the scope of the invention patent. [Brief description of the drawings] Figure 1 'is a block diagram of the encoding system of the present invention. Figure 2 'is a block diagram of the decoding system of the present invention. Figure 3 'is a schematic diagram of a wavelet packet architecture used in the present invention. Fig. 4 'is a diagram showing the performance of the present invention. [Comparison of component numbers] Encoder 1 Wavelet packet module 1 1 Selective discrete cosine conversion module 12 Human ear acoustic model 13 bit configuration module 1 4 Uniform quantizer 1 5 Entropy encoder 1 6 Decoder 2 Entropy decoder 2 1 Inverse quantizer 2 2 Selective inverse discrete cosine transform module 2 3 Inverse wavelet packet module 2 4

Claims

1246256 拾、申請專利範圍： 1. 一種具混合小波封包與離散餘弦轉換之音訊壓縮裝置，其係由一編碼器及一解碼器所組成，而該編碼器包含有：一小波封包模組（Wavelet Packet )，係用以將音訊分頻；一選擇性離散餘弦轉換模組（SDCT)，係用以將次頻帶之訊號，轉換成頻譜線；一人耳聲學模型（Psychoacoustic model )，係用以計算頻域遮蔽臨界值；一位元配置模組（B i t a 11 ocat i on )，係用以將頻域遮蔽臨界值轉換成小波域遮蔽臨界值；一均勻量化器（Quant i zer )，係用以將係數做量化；一熵編碼器（Entropy coder)，係用以將量化後的係數加以編碼；而解碼器包含：一摘解碼器（Entropy decoder )，係用以將碼字對應回原本的係數值；一反量化器（Dequant i zer )，係用以將量化後的係數進行反量化；一選擇性反離散餘弦轉換模組（IDCT)，係用以將頻譜線轉換回次頻帶訊號；一反小波封包模組（Inverse Wavelet Packet)，係用以將次頻帶之訊號合成為時域音訊。 15 1246256 離散餘弦轉換之音訊壓縮裝置，其中，該均勻量化器，係依據小波域遮蔽臨界值，調整其量化步階，並針對小波係數進行量化。 9.依據申請專利範圍第1項所述之具混合小波封包與離散餘弦轉換之音訊壓縮裝置，其甲，該熵編碼器1 :根據係數出現的機率，將出現機率較大的數值，指定位元數較短之碼字加以代表。 1 0.依據申請專利範圍第i項所述之具混合小波封包，離散餘弦轉換之音訊壓縮裝置，其中，該熵解碼鲁器係根據熵編碼器編碼時所採用的碼薄，找出字所對應的係數值。 — 1 1.依據巾請專利範圍第i項所述之具混合小波封包與離散餘弦轉換之音訊壓縮裝置，其中，該反量化器，係將量化後的係數乘上量化步階，獲得反後之係數。 12.依據中請專利範圍第i項所述之具混合小波封包與離散餘弦轉換之音訊壓縮裝置，其中，該選擇性^ _ 離散餘弦轉換模組，係根據位元流所記錄的資訊，在需要時執行反離散餘弦轉換，將頻譜線轉換頻帶係數。 ' 久 1 3·依據申請專利範圍第工項所述之具混合小波封包與離散餘弦轉換之音訊壓縮裝置，其中，該反小波封包核組’係將次頻帶之訊號經由小波合成濾波器， 17 1246256 合成為時域音訊訊號。1246256 Patent application scope: 1. An audio compression device with mixed wavelet packet and discrete cosine transform, which is composed of an encoder and a decoder, and the encoder includes: a wavelet packet module (Wavelet Packet) is used to divide the audio frequency; a selective discrete cosine transform module (SDCT) is used to convert the sub-band signals into spectral lines; a human ear acoustic model (Psychoacoustic model) is used to calculate Frequency domain masking threshold; a one-bit configuration module (B ita 11 ocat i on), which is used to convert the frequency domain masking threshold into a wavelet domain masking threshold; a uniform quantizer (Quant i zer), which uses Quantize the coefficients; an Entropy coder is used to encode the quantized coefficients; and the decoder includes: an Entropy decoder, which is used to map the codeword back to the original Coefficient value; a dequantizer (Dequantizer), which is used to dequantize the quantized coefficients; a selective inverse discrete cosine transform module (IDCT), which is used to convert the spectrum Line to the sub-band signal; an inverse wavelet packet module (Inverse Wavelet Packet) is used to synthesize the sub-band signal into time-domain audio. 15 1246256 A discrete cosine-transformed audio compression device, wherein the uniform quantizer adjusts its quantization step according to the wavelet domain masking threshold, and quantizes the wavelet coefficients. 9. According to the audio compression device with mixed wavelet packet and discrete cosine transformation described in item 1 of the scope of the patent application, the first, the entropy encoder 1: according to the probability of the coefficient appearing, a numerical value with a higher probability will appear, specifying the bit Shorter codewords are represented. 10. The audio compression device with mixed wavelet packet and discrete cosine transform according to item i in the scope of the patent application, wherein the entropy decoder is based on the codebook used when encoding by the entropy encoder to find the word location. Corresponding coefficient value. — 1 1. The audio compression device with mixed wavelet packet and discrete cosine transform according to item i of the patent scope, wherein the inverse quantizer multiplies the quantized coefficient by the quantization step to obtain the inverse Coefficient. 12. According to the audio compression device with mixed wavelet packet and discrete cosine conversion described in item i of the patent scope, wherein the selective ^ _ discrete cosine conversion module is based on the information recorded by the bit stream. Perform an inverse discrete cosine transform, if necessary, to convert the spectral lines to band coefficients. 'Jiu 1 3 · According to the audio compression device with mixed wavelet packet and discrete cosine transform as described in the first item of the scope of the patent application, the anti-wavelet packet core group' passes the signal of the sub-band through a wavelet synthesis filter, 17 1246256 is synthesized into a time domain audio signal.

1818