TWI246256B - Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation - Google Patents

Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation Download PDF

Info

Publication number
TWI246256B
TWI246256B TW93120054A TW93120054A TWI246256B TW I246256 B TWI246256 B TW I246256B TW 93120054 A TW93120054 A TW 93120054A TW 93120054 A TW93120054 A TW 93120054A TW I246256 B TWI246256 B TW I246256B
Authority
TW
Taiwan
Prior art keywords
discrete cosine
wavelet
wavelet packet
audio compression
audio
Prior art date
Application number
TW93120054A
Other languages
Chinese (zh)
Other versions
TW200603553A (en
Inventor
Pao-Chi Chang
Tsung-Hung Wu
Original Assignee
Univ Nat Central
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Central filed Critical Univ Nat Central
Priority to TW93120054A priority Critical patent/TWI246256B/en
Application granted granted Critical
Publication of TWI246256B publication Critical patent/TWI246256B/en
Publication of TW200603553A publication Critical patent/TW200603553A/en

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

An apparatus for audio compression by using mixed wavelet packet and discrete cosine transformation techniques is provided. The apparatus uses wavelet packet frequency division to divide the audio signals into a plurality of sub-frequency bands through a plurality of filters. Then, based on the flatness in the time domain and the frequency domain, the apparatus determines whether to execute the discrete cosine transformation. By using the optimal bit allocation algorithm of a non-ideal synthesized filter, the apparatus transforms the minimum shielding threshold of the frequency domain derived by the psychoacoustic model into a shielding threshold on the wavelet domain to provide accurate quantization guideline. Then, the apparatus uses a quantizer, in combination with the wavelet domain shielding threshold, to greatly reduce the data amount and maintain the high audio quality. Finally, the entropy coding is used to code the quantized parameters and packets them into bit-streams. Thus, a better wavelet shielding threshold can be obtained, and the higher time resolution or frequency resolution can be selectively obtained to match the different music characteristics to obtain higher music quality.

Description

1246256 玖、發明說明: 【發明所屬之技術領域】 本發明是㈣於-種具混合小㈣包與離 換之音訊壓縮裝置,尤指—種可得到較以往小訊= 縮發明更佳之小波域遮蔽臨界值4可選擇性地得^ 南的時間解析度或頻率解析度,以配 音,獲得更高的樂音品質。 子〖之市 【先前技術】 隨著資訊時代的來臨,多媒體在網路上的 及在PM、手機等可攜式裝置上的儲 來 到重視。其中,大料音訊 也越來越叉 久錄;《田μ比^ 4虎有者極兩的需求,在 各種應用上白以向品質、低儲存容量、低複雜 尤其在頻寬有限的無線環境及儲存 又可、"’ 置上’更加突顯出音訊I縮的重要性。❾几式裝 音訊包括語音和樂音,語音的壓縮大 1 古(LlnearPredi⑴⑽)的方法以建立發聲源的=預 f 5吾音中多餘的資料(Redundancy),以達、次 料量的目的。而樂音盥古五立 減乂貝 的頻率分佈範圍較廣:需。;較:::特性’在頻域上 的發聲模型不似語音之;:=;=樂音 來達成壓缩M g μ & …法以曰源之線性預估 烕錢的目的,所以通常是利用人耳 >、位元率並達到聽覺上^失真的效果。 、來減 在消f者對高品質樂音的極度需求下,各種不同的 1246256 數位編碼形式也就應運而生,其中最為廣泛使用的是CD (Compact Disk)形式,其取樣頻率為44. 1 KHz,位元率 為1.4Mbps,如此高的傳輸位元率使得其應用層面受到極 大的限制,針對於此,各種樂音編碼壓縮的方法相繼地 被提出。 以人耳聲學模型為基礎的演算法,可將高品質雙聲 道的樂音壓縮至192K〜128Kbps或更低的位元率,在工業 界的標準方面,國際單位標準局(International Organization for Standardization,簡稱ISO)與國際 電子技術委員會(International Electrotechnical Commission,簡稱 IEC)在 1992 年制定了 MPEG-1 Audio 標 準,其中分為三個層次,第一層複雜度最低,位元率在 128Kbps以上,第二層複雜度中等,位元率大約在128Kbps 左右,第三層複雜度最高,位元率可達64Kbps。1997更 進一步制定了 MPEG-2 AAC (Advanced Audio Coding)音 訊壓縮標準;由1998至今MPEG-4 AAC以MPEG-2 AAC為核 心’仍持續不斷地加入更多有用的工具,提供更高效的 壓縮成果。IS0/MPEG-1 audio以分頻編碼為基礎,把聲 音頻帶分成頻寬相等的32個次頻帶,但是其主要的缺點 在於其分頻時並沒有配合人耳聽覺的特性,頻帶的個數 與頻寬跟所謂的關鍵頻帶(Critical Band)並不一致。 而在MPEG-1 layer 3、MPEG-2 AAC、MPEG-4 AAC方面, 則以 MDCT (Modified Discrete Cosine Transform)做 1246256 為其轉換核心,以獲取絕佳的頻率解析度,但也因此使 得時間解析度十分匱乏。 【發明内容】 因此,本發明之主要目的係在於,可得到較以往小 波音訊壓縮發明更佳之小波域遮蔽臨界值。 本發明之另一目的係在於,可選擇性地得到較高的 時間解析度或頻率解析度,以配合不同特性之樂音,獲 得更南的樂音品質。 為達上述之目的,本發明係一種具混合小波封包與 離散餘弦轉換之音訊壓縮裝置,其係由一編碼器及一解 碼器所組成,而該編碼器包含有:一小波封包模組 (Wavelet Packet ),係用以將音訊分頻;一選擇性離散 餘弦轉換模組(SDCT),係用以將次頻帶之訊號,轉換成 頻譜線;一人耳聲學模型( Psychoacoustic model ),係 用以計算頻域遮蔽臨界值;一位元配置模組(Bit al location ),係用以將頻域遮蔽臨界值轉換成小波域遮 蔽臨界值;一均勻量化器(Quantizer),係用以將係數 做量化;一燏編碼器(Entropy coder),係用以將量化後 的係數加以編碼; 而解碼器包含:一熵解碼器(Entropy decoder) ’ 係用以將碼字對應回原本的係數值;一反量化器 (Dequant i zer ),係用以將量化後的係數進行反量化; 一選擇性反離散餘弦轉換模組(IDCT),係用以將頻譜線 1246256 轉換回次頻φ訊號;-反小波封包模組(!㈣以^ W謙^ Μ Pack⑷,係用以將次頻帶之訊號合成為時域音訊。如是 可得到較以往小波音訊壓縮發明更佳之小波域遮蔽臨界 值’且可選擇性地得到較高的時間解析度或頻率解析 度’以配合不同特性之樂音,獲得更高的樂音品質。 【實施方式】 立請參閱『第1〜4圖』,係本發明之編碼系統方塊示 思圖、本發明之解碼线方塊示㈣、本㈣所使用之 小波封包架構示意圖、本發明之效能表現示意圖。如圖 所^本發明係-種具混合小㈣包與離散餘弦轉換之 音訊壓縮裝置,其係由一編碼器及一解碼器所組成;可 得到較以往小波音訊壓縮發明更佳之小波域遮蔽臨界 值,且可選擇性地得到較高的時間解析度或頻率解析 度’以配合不同特性之樂I,獲得更高的樂音品質。 上述所提之編碼器1係包括: 立一小波封包模組1;ί (WaveletPacket),係用以將 曰訊分頻,而該小波封包模組工i係使用Daubechies小 波作為訊號之分頻,且該小波封包模組i丄係 :訊分解成26個次頻帶,而其分頻方式如第3圖所:。 f某頻帶在進—步分頻後能得到較佳的效果,就令該頻 帶分解成二個次頻帶;若分頻後並不能得到較好的妗 果,則令該頻帶維持原狀。此一測試動作會從最上層 1246256 二:依序執仃元該層的每個次頻帶後,再進行下 層的料,分解層數的上限為7層; 再。丁下— u ! 2 (SDCT), -人須节之訊迖轉換成頻譜線,而付 私..“今 楛组1 2,係財轉換前後 餘料 点-w 十3二度ΐ別值,苴令正p 度1冽值之計算方式如下·· /、宁,垤1246256 Description of the invention: [Technical field to which the invention belongs] The present invention is a kind of audio compression device with a mixed small bag and exchange, especially a wavelet domain that can obtain better than the previous small message = shrink invention The masking threshold value 4 can selectively obtain a time resolution or a frequency resolution of ^ South for dubbing to obtain higher musical quality. [Article] [Previous Technology] With the advent of the information age, the storage of multimedia on the Internet and on portable devices such as PM and mobile phones has come to the fore. Among them, the audio of the big data is also becoming more and more long-term recorded; "Tian μ ratio ^ 4 tigers have extremely different needs, in various applications to achieve quality, low storage capacity, low complexity, especially in wireless environments with limited bandwidth And storage, "quote" puts more emphasis on the importance of audio compression. ❾ Several styles of audio include voice and music, and the compression of the voice is large (LlnearPredi⑴⑽) to build the sound source = pref 5 redundant data (redundancy), in order to achieve the purpose of volume. However, the frequency distribution of Leyin Shuiguwuli Jianbei is relatively wide: required. ; Compare ::: Characteristic's vocalization model in the frequency domain is not similar to that of speech;: =; = music to achieve compression M g μ &… method to save money by linear estimation of source, so it is usually used Human >, bit rate and achieve the effect of hearing distortion. In order to reduce the extreme needs of consumers for high-quality music, a variety of different 1246256 digital encoding formats have emerged, the most widely used is the CD (Compact Disk) format, with a sampling frequency of 44.1 KHz The bit rate is 1.4 Mbps. Such a high transmission bit rate makes its application level greatly limited. In view of this, various methods of music encoding and compression have been successively proposed. The human ear acoustic model-based algorithm can compress high-quality two-channel music to a bit rate of 192K ~ 128Kbps or lower. In terms of industrial standards, the International Organization for Standardization (International Organization for Standardization, (ISO for short) and the International Electrotechnical Commission (IEC) formulated the MPEG-1 Audio standard in 1992, which is divided into three levels, the first layer has the lowest complexity, the bit rate is above 128Kbps, and the second The layer complexity is medium, the bit rate is about 128Kbps, the third layer has the highest complexity, and the bit rate can reach 64Kbps. 1997 further developed the MPEG-2 AAC (Advanced Audio Coding) audio compression standard; from 1998 to the present, MPEG-4 AAC with MPEG-2 AAC as its core has continued to add more useful tools to provide more efficient compression results . IS0 / MPEG-1 audio is based on frequency division coding and divides the sound frequency band into 32 sub-bands with equal bandwidth. However, its main disadvantage is that its frequency division does not match the characteristics of human hearing. The bandwidth is not consistent with the so-called critical band. In terms of MPEG-1 layer 3, MPEG-2 AAC, and MPEG-4 AAC, MDCT (Modified Discrete Cosine Transform) is used as the conversion core of 1246256 to obtain excellent frequency resolution, but it also makes time analysis Degree is very scarce. [Summary of the Invention] Therefore, the main object of the present invention is to obtain a wavelet domain masking threshold value which is better than the conventional wavelet audio compression invention. Another object of the present invention is to selectively obtain a higher time resolution or frequency resolution to match musical tones with different characteristics to obtain more southern musical tones. In order to achieve the above object, the present invention is an audio compression device with mixed wavelet packet and discrete cosine transform, which is composed of an encoder and a decoder, and the encoder includes: a wavelet packet module (Wavelet Packet) is used to divide the audio frequency; a selective discrete cosine transform module (SDCT) is used to convert the sub-band signals into spectral lines; a human ear acoustic model (psychoacoustic model) is used to calculate Frequency domain masking threshold; a bit configuration module (Bit al location) is used to convert the frequency domain masking threshold into a wavelet domain masking threshold; a uniform quantizer (Quantizer) is used to quantize the coefficients ; Entropy coder, which is used to encode the quantized coefficients; and the decoder includes: an entropy decoder (Entropy decoder), which is used to map the codeword back to the original coefficient value; A quantizer (Dequantizer) is used to inverse quantize the quantized coefficients; a selective inverse discrete cosine transform module (IDCT) is used to convert the spectrum line 1246256 back to the secondary frequency φ signal No .;-Anti-Wavelet Packet Module (! ㈣ ^ W 谦 ^ Μ Pack⑷, which is used to synthesize the signals in the sub-band into time-domain audio. If this is the case, a wavelet-domain masking threshold that is better than previous wavelet audio compression inventions can be obtained 'Alternatively, a higher time resolution or frequency resolution can be obtained' in order to match music with different characteristics to obtain higher music quality. [Embodiment] Please refer to [Figures 1 to 4], which is the present invention The block diagram of the encoding system, the block diagram of the decoding line of the present invention, the schematic diagram of the wavelet packet architecture used by the present invention, and the performance performance of the present invention. As shown in the figure ^ The present invention is a hybrid packet with discrete Cosine-transformed audio compression device, which is composed of an encoder and a decoder; it can obtain the wavelet domain masking threshold that is better than the previous wavelet audio compression invention, and can optionally obtain a higher time resolution or 'Frequency resolution' to match music I with different characteristics to obtain higher musical sound quality. The encoder 1 mentioned above includes: a wavelet packet module 1; (WaveletPacket), used to Divide the frequency of the signal, and the wavelet packet module i uses the Daubechies wavelet as the signal frequency division, and the wavelet packet module i: the signal is decomposed into 26 sub-bands, and the frequency division method is as the third Figure: f. After a certain frequency band can achieve better results after frequency division, the frequency band is decomposed into two sub-frequency bands; if a better result is not obtained after frequency division, the frequency band is maintained As it is. This test action will start from the top layer 1246256 II: After performing each sub-band of this layer in order, the next layer will be processed, and the upper limit of the number of decomposition layers is 7 layers. Again. Dingxia-u! 2 (SDCT),-The signal of the festival of people must be converted into a spectrum line, and paid privately .. "The current group 1 2 is the remaining material point before and after the conversion of the wealth -w 13 2 degrees, and the order is positive p degrees The calculation method of the value of 1 is as follows: / / Ning, 垤

Fl (fH' rlainess_\ieasure{x)^ J A -l γΣ^)Fl (fH 'rlainess_ \ ieasure (x) ^ J A -l γΣ ^)

11 m.Q 並選擇所需之信號進行離散餘弦轉換’若職散餘 ㈣換叙平域度量測值較差時,心要進行離散餘弘 轉換·· 一 玟選擇性趁散餘弦轉換权组2 2 (sdct)之轉桉公 式如下·· 、“11 mQ and select the required signal to perform discrete cosine transformation. 'If the post cosine conversion is not good, the discrete cosine conversion is necessary. · A group of selective cosine transformation weights is selected 2 2 (sdct) The conversion formula is as follows ...

你)=雄)&⑻ eos^L^, n=0 2iV w(/:)=You) = Male) & ⑻ eos ^ L ^, n = 0 2iV w (/:) =

^ = 0,1,...,^-1 k = Q \<k<N-\ 一人耳聲學模型 1 3 ( Psychoacous t i c mode 1 ),係 用以計算頻域遮蔽臨界值,而該人耳聲學模型1 3,係 1246256 利用不可預測性及延展函數,計算出頻域最大遮蔽臨界 值; 一位元配置模組丄4 (Bit allocati〇n),係用以將 頻域遮蔽臨界值轉換成小波域遮蔽臨界值,而該位元配 置模組1 4係根據頻域之遮蔽臨界值,以及各個次頻帶 之重建濾波器的頻率響應,可將頻域之遮蔽臨界值轉換 成小波域遮蔽臨界值; ^ 一均勻量化器1 5 (Quantizer),係用以將係數做 量化,而該均勻量化器1 5,係依據小波域遮蔽臨界值, 調整其量化步階’並針對小波係數進行量化; ,:熵編碼器1 6 ( Entropy coder ),係用以將量化 後的係數加以編碼,而該熵編碼器i 6,係根據係數出 現的機率’將出現機率較大的數值,指^位元數較短之 碼字加以代表; 該解碼器2係包括·· 一 熵解碼器2 1 ( Entropy decoder),係用以將碼 :對ί回原本的係數值,而該熵解碼^ 2 1,係根據熵 =。。1 6編碼時所採用的碼薄,找出碼字所對應的係 。且_解碼器2 i係將資料分割成位元平面(阶 Γ 並使用一位兀算術編碼(Binary Arithmetic 像壓丄依位儿的重要性’循序編碼。此編碼方式在影 切i /不但擁有極佳㈣縮效果,同時也具有嵌入 式、·扁碼(Embeddedcoding)的特性,可依據位元率的要 1246256 求,將超出位元率的資 解碼端依然可順利解出除僅傳㈣端的資料,在 ^ 出’亚且擁有相當優_沾旦 貝。而該熵解碼器2 1亦可僅採 ’的像品 出現機率”頻帶内’各個量化後的係數值之 出現機羊’用以建立算術編 值之 編碼時,每個音框都㈣要的付錢率表。在 表做算術編碼,並在备個立4 〜機率 母個e框編完後都會做收尾動作, 用以稍微控制錯誤蔓延。 一反量化器2 2 (DeqUantizer),係用以將量化後籲 的係數進行反量化,而該反量化器2 2,係將量化後的 係數乘上量化步階,獲得反量化後之係數; 一選擇性反離散餘弦轉換模組2 3 ( IDCT ),係用以 將頻譜線轉換回次頻帶訊號,而該選擇性反離散餘弦轉 換模組2 3,係根據位元流所記錄的資訊,在需要時執 行反離散餘弦轉換,將頻譜線轉換回次頻帶係數; 而選擇性反離散餘弦轉換模組2 3 ( IDCT )之轉換$ 公式如下:^ = 0,1, ..., ^-1 k = Q \ < k < N- \ A person's ear acoustic model 1 3 (Psychoacous tic mode 1) is used to calculate the threshold value of the frequency domain occlusion, and the person Otoacoustic model 13, system 1246256 uses the unpredictability and extension function to calculate the maximum masking threshold in the frequency domain; a bit configuration module 丄 4 (Bit allocati〇n) is used to convert the frequency domain masking threshold Into the wavelet domain masking threshold, and the bit allocation module 14 converts the frequency domain masking threshold into the wavelet domain masking according to the masking threshold in the frequency domain and the frequency response of the reconstruction filter in each sub-band. Threshold value; ^ A uniform quantizer 15 (Quantizer) is used to quantize the coefficients, and the uniform quantizer 15 is used to adjust the quantization step according to the wavelet domain masking threshold and quantize the wavelet coefficients. ; :: Entropy coder 1 6 (Entropy coder) is used to encode the quantized coefficients, and the entropy encoder i 6 is based on the probability that the coefficients appear 'will have a higher probability value, referring to ^ bit Shorter codewords are represented; the decoder 2 includes ... An entropy decoder 2 1 (Entropy decoder) is used to convert the code to the original coefficient value, and the entropy decoder ^ 2 1 is based on entropy =. . 16 The codebook used in encoding, and find the system corresponding to the codeword. And _decoder 2 i is to divide the data into bit planes (order Γ and use one-bit arithmetic coding (Binary Arithmetic) like sequential coding. This coding method is used in shadow cutting i / not only has Excellent curling effect, and also has the characteristics of embedded, flat coding (Embedded coding), can be based on the bit rate requirements of 1246256, will exceed the bit rate of the decoding end can still be successfully resolved Data, ^ out of 'Asia and has a very good _ Zhan Danbei. The entropy decoder 21 can also only use the' probability of image appearance 'in the frequency band' appearance of each quantized coefficient value ' When setting up the coding of arithmetic coding, each sound box is required to pay the rate table. Do arithmetic coding in the table, and after finishing the preparation of 4 ~ probability mother e box, the closing action will be performed to slightly Control the spread of errors. A dequantizer 2 2 (DeqUantizer) is used to dequantize the quantized coefficients, and the dequantizer 2 2 is used to multiply the quantized coefficients by the quantization step to obtain the inverse quantization. After coefficient; a selective inverse discrete cosine Conversion module 2 3 (IDCT) is used to convert the spectrum line back to the sub-band signal, and the selective inverse discrete cosine conversion module 2 3 is based on the information recorded by the bit stream to perform inverse dispersion when needed. Cosine transformation converts the spectral lines back to sub-band coefficients; and the conversion of the selective inverse discrete cosine transformation module 2 3 (IDCT) is as follows:

一反小波封包模組2 /7 = 0,1,·",Λ’ - 1Inverse wavelet packet module 2/7 = 0,1, ", Λ ’-1

n = 0 1<η<\ -I 4 ( Inverse Wavelet 1246256n = 0 1 < η < \ -I 4 (Inverse Wavelet 1246256

Packet) ’係用以將次頻帶之訊號合成為時域音訊,而該 反小㈣包模組2 4,係將次頻帶之訊號經由小波合成 濾波為’合成為時域音訊訊號。 在指定Vanishing Moments N並要求符合Minimum Support的條件下,—共會有,個小波可供選擇。當調 適性小波轉換核心啟動時,每個音框會在所有的候選小 波係數中,選擇可使式(4.9)有最小值的那組小波係數, 來做為此音框的小波轉換核^。在此次實驗中,採用固 定式小波封包,且不使用DCT,分別針對⑽也叩·Packet) 'is used to synthesize the sub-band signal into time-domain audio, and the anti-small packet module 2 4 is used to filter the sub-band signal into a time-domain audio signal through wavelet synthesis filtering. Under the condition that Vanishing Moments N is specified and minimum support is required, there will be a total of wavelets to choose from. When the adaptive wavelet transform core is activated, each frame will select the set of wavelet coefficients that will have a minimum value in equation (4.9) among all the candidate wavelet coefficients as the wavelet transform kernel for this frame ^. In this experiment, a fixed wavelet packet was used, and no DCT was used.

Moments ^ 8 M2 > 20 ^ Daubechies Co.pactly Support 小波,測試其固定式小波核心與調適性小波核心的壓縮 效能表現。實驗結果顯示如第4圖所示,其中除本發明 之外還包含了 MCLC、AACHE&Mp3的壓縮效果,以 供比對。 實驗結果發現’適應性小波核心確實對音質有所助 益。值得注意的是,適應性的複雜度隨著小波係數的長 度增長,會成指數上升。舉例來說,在VanishingMQments · ,20的情況下’候選小波已高達蘭組,也就是說運 算複雜度提升了上千倍’但是改進卻十分有限。與其使 用調適性小波’倒不如直接採用擁有更多 Moments的小波,對於效能的改進更直接、有效,其所帶 來的複雜度負擔也只是成線性成長,因此最後本系統只 採用固定式的小波核心轉換。 12 1246256 綜上所述,本發明具混合小波封包與離散餘弦轉換 之音訊壓縮裝置,係混合小波與離散餘弦轉換之音訊壓 縮發明,以小波封包分頻方式,將樂音訊號經由濾波器 群組分成26個次頻帶,再根據時域與頻域之平坦程度, 決定是否要進一步執行離散餘弦轉換。本發明並採用非 理想合成濾波器之最佳位元配置演算法,將人耳聲學模 型所得出的頻域最小遮蔽臨界值,轉換成小波域上的遮 蔽臨界值,以提供精良的量化準則。其後以均勻量化器 配合小波域的遮蔽臨界值,大幅降低資料量並仍保有極 高的音質,最後再以算術編碼將量化後的係數做進一步 的熵編碼並封裝成位元流。本發明在使用 van i sh i ng moments 為 12 之 Daubechies compactly support 小波 時,其壓縮效能表現如第4圖所示,其中除本發明之外 還包含了 AAC LC 、AAC HE及MP3的壓縮效果,以供比 對。結果顯示,本發明僅需52 kbps即可達到MP3 64 kbps 的音質;另外,在同樣64 kbps之位元率下,本發明所 提供的音質不但優於MP3、AAC低複雜度規格,更可超越 AAC高效率規格。本發明之産生能更進步、更實用、更符 合使用者之所須,確已符合發明專利申請之要件,爰依 法提出專利申請,尚請貴審查委員撥冗細審,並盼早 曰准予專利以勵發明,實感德便。 惟以上所述者,僅為本發明之較佳實施例而已,當 不能以此限定本發明實施之範圍;故,凡依本發明申請 13 1246256 專利範圍及發明說明書内容所作之簡單的等效變化與修 飾’皆應仍屬本發明專利涵蓋之範圍内。 【圖式簡單說明】 第1圖’係本發明之編碼系統方塊示意圖。 第2圖’係本發明之解碼系統方塊示意圖。 第3圖’係本發明所使用之小波封包架構示意圖。 第4圖’係本發明之效能表現示意圖。 【元件標號對照】 編碼器1 小波封包模組1 1 選擇性離散餘弦轉換模組12 人耳聲學模型13 位元配置模組1 4 均勻量化器1 5 熵編碼器1 6 解碼器2 熵解碼器2 1 反量化器2 2 選擇性反離散餘弦轉換模組2 3 反小波封包模組2 4Moments ^ 8 M2 > 20 ^ Daubechies Co.pactly Support wavelet, test the compression performance of its fixed wavelet core and adaptive wavelet core. The experimental results show that as shown in Figure 4, in addition to the present invention, the compression effects of MCLC, AACHE & Mp3 are also included for comparison. The experimental results found that the core of adaptive wavelet really does help the sound quality. It is worth noting that the complexity of adaptability increases exponentially as the length of the wavelet coefficient increases. For example, in the case of VanishingMQments, 20, the 'candidate wavelet has reached the blue group, that is, the computational complexity has increased by a thousand times' but the improvement is very limited. Instead of using adaptive wavelets, it is better to directly use wavelets with more Moments. The improvement of performance is more direct and effective, and the complexity burden brought by it only grows linearly. Therefore, this system only uses fixed wavelets. Core conversion. 12 1246256 In summary, the present invention has an audio compression device with mixed wavelet packet and discrete cosine transform, which is an invention of audio compression with mixed wavelet and discrete cosine transform. The wavelet packet frequency division method is used to divide the music signal through the filter group 26 sub-bands, and then decide whether to further perform discrete cosine conversion based on the flatness of the time and frequency domains. The present invention also uses the optimal bit allocation algorithm of the non-ideal synthesis filter to convert the minimum masking threshold in the frequency domain obtained from the acoustic model of the human ear into the masking threshold in the wavelet domain to provide sophisticated quantization criteria. After that, a uniform quantizer was used in conjunction with the masking threshold in the wavelet domain to greatly reduce the amount of data and still maintain a very high sound quality. Finally, the quantized coefficients were further entropy coded by arithmetic coding and encapsulated into a bit stream. When the present invention uses a Daubechies compactly support wavelet with van i sh i ng moments of 12, its compression performance is shown in Figure 4, which includes the compression effects of AAC LC, AAC HE and MP3 in addition to the present invention. For comparison. The results show that the present invention only needs 52 kbps to achieve MP3 64 kbps sound quality. In addition, at the same bit rate of 64 kbps, the sound quality provided by the present invention is not only better than MP3 and AAC low complexity specifications, but also beyond AAC high efficiency specifications. The invention of this invention can be more advanced, more practical, and more in line with the needs of users. It has indeed met the requirements for patent applications for inventions. The patent application has been submitted in accordance with the law. Encouraging invention, real sense of virtue. However, the above are only the preferred embodiments of the present invention, and the scope of implementation of the present invention cannot be limited by this; therefore, any simple equivalent changes made in accordance with the scope of the present application 13 1246256 patent and the content of the invention specification And modification 'should still fall within the scope of the invention patent. [Brief description of the drawings] Figure 1 'is a block diagram of the encoding system of the present invention. Figure 2 'is a block diagram of the decoding system of the present invention. Figure 3 'is a schematic diagram of a wavelet packet architecture used in the present invention. Fig. 4 'is a diagram showing the performance of the present invention. [Comparison of component numbers] Encoder 1 Wavelet packet module 1 1 Selective discrete cosine conversion module 12 Human ear acoustic model 13 bit configuration module 1 4 Uniform quantizer 1 5 Entropy encoder 1 6 Decoder 2 Entropy decoder 2 1 Inverse quantizer 2 2 Selective inverse discrete cosine transform module 2 3 Inverse wavelet packet module 2 4

Claims (1)

1246256 拾、申請專利範圍: 1. 一種具混合小波封包與離散餘弦轉換之音訊壓縮裝 置,其係由一編碼器及一解碼器所組成,而該編碼 器包含有: 一小波封包模組(Wavelet Packet ),係用以將音訊 分頻; 一選擇性離散餘弦轉換模組(SDCT),係用以將次頻 帶之訊號,轉換成頻譜線; 一人耳聲學模型(Psychoacoustic model ),係用以 計算頻域遮蔽臨界值; 一位元配置模組(B i t a 11 ocat i on ),係用以將頻域 遮蔽臨界值轉換成小波域遮蔽臨界值; 一均勻量化器(Quant i zer ),係用以將係數做量化; 一熵編碼器(Entropy coder),係用以將量化後的係 數加以編碼; 而解碼器包含: 一摘解碼器(Entropy decoder ),係用以將碼字對 應回原本的係數值; 一反量化器(Dequant i zer ),係用以將量化後的係 數進行反量化; 一選擇性反離散餘弦轉換模組(IDCT),係用以將頻 譜線轉換回次頻帶訊號; 一反小波封包模組(Inverse Wavelet Packet),係 用以將次頻帶之訊號合成為時域音訊。 15 1246256 離散餘弦轉換之音訊壓縮裝置,其中,該均勻量化 器,係依據小波域遮蔽臨界值,調整其量化步階, 並針對小波係數進行量化。 9.依據申請專利範圍第1項所述之具混合小波封包與 離散餘弦轉換之音訊壓縮裝置,其甲,該熵編碼器1 :根據係數出現的機率,將出現機率較大的數值, 指定位元數較短之碼字加以代表。 1 0.依據申請專利範圍第i項所述之具混合小波封包 ,離散餘弦轉換之音訊壓縮裝置,其中,該熵解碼鲁 器係根據熵編碼器編碼時所採用的碼薄,找出 字所對應的係數值。 — 1 1.依據巾請專利範圍第i項所述之具混合小波封包 與離散餘弦轉換之音訊壓縮裝置,其中,該反量化 器,係將量化後的係數乘上量化步階,獲得反 後之係數。 12.依據中請專利範圍第i項所述之具混合小波封包與 離散餘弦轉換之音訊壓縮裝置,其中,該選擇性^ _ 離散餘弦轉換模組,係根據位元流所記錄的資訊, 在需要時執行反離散餘弦轉換,將頻譜線轉換 頻帶係數。 ' 久 1 3·依據申請專利範圍第工項所述之具混合小波封包與 離散餘弦轉換之音訊壓縮裝置,其中,該反小波封 包核組’係將次頻帶之訊號經由小波合成濾波器, 17 1246256 合成為時域音訊訊號。1246256 Patent application scope: 1. An audio compression device with mixed wavelet packet and discrete cosine transform, which is composed of an encoder and a decoder, and the encoder includes: a wavelet packet module (Wavelet Packet) is used to divide the audio frequency; a selective discrete cosine transform module (SDCT) is used to convert the sub-band signals into spectral lines; a human ear acoustic model (Psychoacoustic model) is used to calculate Frequency domain masking threshold; a one-bit configuration module (B ita 11 ocat i on), which is used to convert the frequency domain masking threshold into a wavelet domain masking threshold; a uniform quantizer (Quant i zer), which uses Quantize the coefficients; an Entropy coder is used to encode the quantized coefficients; and the decoder includes: an Entropy decoder, which is used to map the codeword back to the original Coefficient value; a dequantizer (Dequantizer), which is used to dequantize the quantized coefficients; a selective inverse discrete cosine transform module (IDCT), which is used to convert the spectrum Line to the sub-band signal; an inverse wavelet packet module (Inverse Wavelet Packet) is used to synthesize the sub-band signal into time-domain audio. 15 1246256 A discrete cosine-transformed audio compression device, wherein the uniform quantizer adjusts its quantization step according to the wavelet domain masking threshold, and quantizes the wavelet coefficients. 9. According to the audio compression device with mixed wavelet packet and discrete cosine transformation described in item 1 of the scope of the patent application, the first, the entropy encoder 1: according to the probability of the coefficient appearing, a numerical value with a higher probability will appear, specifying the bit Shorter codewords are represented. 10. The audio compression device with mixed wavelet packet and discrete cosine transform according to item i in the scope of the patent application, wherein the entropy decoder is based on the codebook used when encoding by the entropy encoder to find the word location. Corresponding coefficient value. — 1 1. The audio compression device with mixed wavelet packet and discrete cosine transform according to item i of the patent scope, wherein the inverse quantizer multiplies the quantized coefficient by the quantization step to obtain the inverse Coefficient. 12. According to the audio compression device with mixed wavelet packet and discrete cosine conversion described in item i of the patent scope, wherein the selective ^ _ discrete cosine conversion module is based on the information recorded by the bit stream. Perform an inverse discrete cosine transform, if necessary, to convert the spectral lines to band coefficients. 'Jiu 1 3 · According to the audio compression device with mixed wavelet packet and discrete cosine transform as described in the first item of the scope of the patent application, the anti-wavelet packet core group' passes the signal of the sub-band through a wavelet synthesis filter, 17 1246256 is synthesized into a time domain audio signal. 1818
TW93120054A 2004-07-02 2004-07-02 Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation TWI246256B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW93120054A TWI246256B (en) 2004-07-02 2004-07-02 Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW93120054A TWI246256B (en) 2004-07-02 2004-07-02 Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation

Publications (2)

Publication Number Publication Date
TWI246256B true TWI246256B (en) 2005-12-21
TW200603553A TW200603553A (en) 2006-01-16

Family

ID=37191374

Family Applications (1)

Application Number Title Priority Date Filing Date
TW93120054A TWI246256B (en) 2004-07-02 2004-07-02 Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation

Country Status (1)

Country Link
TW (1) TWI246256B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8260609B2 (en) 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
TWI449033B (en) * 2008-07-11 2014-08-11 弗勞恩霍夫爾協會 Audio encoder and method for encoding segments of coefficients, audio decoder and method for decoding an encoded audio stream, and computer program
TWI477983B (en) * 2012-10-12 2015-03-21 Hwa Hsia Inst Of Technology Analysis method of analog signal and digital signal
CN110311687A (en) * 2019-07-09 2019-10-08 南京天数智芯科技有限公司 A kind of time series data lossless compression method based on Integrated Algorithm

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8260609B2 (en) 2006-07-31 2012-09-04 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
US9324333B2 (en) 2006-07-31 2016-04-26 Qualcomm Incorporated Systems, methods, and apparatus for wideband encoding and decoding of inactive frames
TWI449033B (en) * 2008-07-11 2014-08-11 弗勞恩霍夫爾協會 Audio encoder and method for encoding segments of coefficients, audio decoder and method for decoding an encoded audio stream, and computer program
US8930202B2 (en) 2008-07-11 2015-01-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio entropy encoder/decoder for coding contexts with different frequency resolutions and transform lengths
US10242681B2 (en) 2008-07-11 2019-03-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder and audio decoder using coding contexts with different frequency resolutions and transform lengths
US10685659B2 (en) 2008-07-11 2020-06-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio entropy encoder/decoder for coding contexts with different frequency resolutions and transform lengths
US11670310B2 (en) 2008-07-11 2023-06-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio entropy encoder/decoder with different spectral resolutions and transform lengths and upsampling and/or downsampling
US11942101B2 (en) 2008-07-11 2024-03-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio entropy encoder/decoder with arithmetic coding and coding context
US12039985B2 (en) 2008-07-11 2024-07-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio entropy encoder/decoder with coding context and coefficient selection
TWI477983B (en) * 2012-10-12 2015-03-21 Hwa Hsia Inst Of Technology Analysis method of analog signal and digital signal
CN110311687A (en) * 2019-07-09 2019-10-08 南京天数智芯科技有限公司 A kind of time series data lossless compression method based on Integrated Algorithm
CN110311687B (en) * 2019-07-09 2022-10-04 上海天数智芯半导体有限公司 Time sequence data lossless compression method based on integration algorithm

Also Published As

Publication number Publication date
TW200603553A (en) 2006-01-16

Similar Documents

Publication Publication Date Title
JP5788833B2 (en) Audio signal encoding method, audio signal decoding method, and recording medium
US20210090581A1 (en) Energy lossless-encoding method and apparatus, audio encoding method and apparatus, energy lossless-decoding method and apparatus, and audio decoding method and apparatus
TWI601130B (en) Audio encoding apparatus
TWI604437B (en) Bit allocating method, bit allocating apparatus and computer readable recording medium
US7689427B2 (en) Methods and apparatus for implementing embedded scalable encoding and decoding of companded and vector quantized audio data
EP1715476B1 (en) Low-bitrate encoding/decoding method and system
JP4570250B2 (en) System and method for entropy encoding quantized transform coefficients of a signal
JP3926726B2 (en) Encoding device and decoding device
JP3354863B2 (en) Audio data encoding / decoding method and apparatus with adjustable bit rate
JP2022050609A (en) Audio-acoustic coding device, audio-acoustic decoding device, audio-acoustic coding method, and audio-acoustic decoding method
CN111179946B (en) Lossless encoding method and lossless decoding method
TW200406096A (en) Improved low bit-rate audio coding systems and methods that use expanding quantizers with arithmetic coding
TWI390502B (en) Processing of encoded signals
CN104485111A (en) Audio/voice coding device and audio/voice decoding device
Sinha et al. The perceptual audio coder (PAC)
US20040002854A1 (en) Audio coding method and apparatus using harmonic extraction
CN106233112B (en) Coding method and equipment and signal decoding method and equipment
JP2004199075A (en) Stereo audio encoding/decoding method and device capable of bit rate adjustment
TWI246256B (en) Apparatus for audio compression using mixed wavelet packets and discrete cosine transformation
Yu et al. A scalable lossy to lossless audio coder for MPEG-4 lossless audio coding
CN112365896B (en) Object-oriented encoding method based on stack type sparse self-encoder
WO2005096508A1 (en) Enhanced audio encoding and decoding equipment, method thereof
US20090170435A1 (en) Data format conversion for bluetooth-enabled devices
Teh et al. Subband coding of high-fidelity quality audio signals at 128 kbps
Chang et al. Scalable embedded zero tree wavelet packet audio coding

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees