JP4223795B2

JP4223795B2 - Wavelet transform apparatus and wavelet transform method

Info

Publication number: JP4223795B2
Application number: JP2002362726A
Authority: JP
Inventors: 雄介水野
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2002-12-13
Filing date: 2002-12-13
Publication date: 2009-02-12
Anticipated expiration: 2022-12-13
Also published as: JP2004194224A

Description

【０００１】
【発明の属する技術分野】
本発明は、ウェーブレット変換を用いた圧縮伸長技術に関する。
【０００２】
【従来の技術】
画像データの高能率符号化方式として、離散ウェーブレット変換（Discrete Wavelet Transformation、以下「ＤＷＴ」と呼ぶ。）に基づく画像の圧縮伸長方法が知られており、これはＩＳＯ（国際標準化機構）が策定するＪＰＥＧ２０００（Joint Photographic Experts Group 2000）方式で採用されている。ＤＷＴの演算方法としては、畳み込み演算方法とリフティング構成（lifting scheme）に基づく演算方法とが知られており、両者は同一結果を出力するが、後者のリフティング構成に基づく演算方法の方が、前者の畳み込み演算方法と比べて、少ないメモリ使用量で高速演算が可能なことや、ロスレス（可逆）圧縮に適することなどの利点を有している。
【０００３】
一般に、ＤＷＴは、原信号を高域成分（高周波数成分）と低域成分（低周波数成分）とに帯域分割するフィルタバンクを用いて構成することが可能である。そして、その逆変換（逆ＤＷＴ）は、帯域分割された高域成分と低域成分とを合成するフィルタバンクを用いて構成することが可能である。
【０００４】
図３５に、ＤＷＴとその逆変換（逆ＤＷＴ）で用いるフィルタバンク２００Ｓ，２００Ａを模式的に示す。入力信号ｘ（ｎ）を低域成分と高域成分との２帯域に分解する分解側フィルタバンク２００Ｓは、低域成分を通過させるローパスフィルタ２０１Ｌと、高域成分を通過させるハイパスフィルタ２０１Ｈと、第１および第２ダウンサンプラー２０２，２０３とで構成されている。ローパスフィルタ２０１Ｌとハイパスフィルタ２０１Ｈは、畳み込み演算を実行するＦＩＲフィルタによって構成される。また、第１および第２ダウンサンプラー２０２，２０３は、それぞれ、フィルタ２０１Ｌ，２０１Ｈからの入力信号を１点おきに間引き、信号長を半分にして出力するものである。ＪＰＥＧ２０００の規格では、第１ダウンサンプラー２０２は奇数番目の信号を間引いて偶数番目の信号（低域成分）を出力し、第２ダウンサンプラー２０３は偶数番目の信号を間引いて奇数番目の信号（高域成分）を出力する。
【０００５】
他方、入力信号（低域成分，高域成分）を合成する合成側フィルタバンク２００Ａは、第１および第２アップサンプラー２０４，２０５と、ローパスフィルタ２０６Ｌと、ハイパスフィルタ２０６Ｈと、加算器２０７とで構成されている。ローパスフィルタ２０６Ｌとハイパスフィルタ２０６Ｈは、畳み込み演算を実行するＦＩＲフィルタによって構成されており、一般に、これら合成側フィルタ２０６Ｌ，２０６Ｈと分解側フィルタ２０１Ｌ，２０１Ｈとは完全再構成条件を満たすように構成される。また、第１および第２アップサンプラー２０４，２０５は、各点間にゼロ値を挿入し信号長を倍にして出力する。そして、加算器２０７は、各合成側フィルタ２０６Ｌ，２０６Ｈから出力された信号を加算し、合成信号ｘ’（ｎ）を出力する。ここで、完全再構成条件を満たす場合はｘ（ｎ）＝ｘ’（ｎ）が成立する。
【０００６】
２次元ＤＷＴは、２次元画像データに対して分解側フィルタバンク２００Ｓを、その２次元画像データの垂直方向，水平方向の順に繰り返し適用することで実行できる。図３６は、３次の分解レベル（decomposition level）でＤＷＴを施された２次元画像データ２１０を模式的に示す帯域分割図である。２次元画像データ２１０中の各ブロックがサブバンド（帯域成分）を表している。例えば、サブバンドＨＨ１は、分解レベル１における垂直方向の高域成分（Ｈ）と水平方向の高域成分（Ｈ）とからなり、サブバンドＬＨ２は、分解レベル２における垂直方向の高域成分（Ｈ）と水平方向の低域成分（Ｌ）とからなる。一般に、サブバンドＸＹｎ（Ｘ，Ｙは「Ｈ」または「Ｌ」の何れか、ｎは分解レベルの次数）は、分解レベルｎにおける垂直方向の成分Ｙと水平方向の成分Ｘとからなるものである。
【０００７】
分解レベル３のＤＷＴの処理手順は次の通りである。先ず、２次元画像全体に、上記分解側フィルタバンク２００Ｓを２回適用することで、分解レベル１のサブバンドＨＨ１，ＨＬ１，ＬＨ１，ＬＬ１（図示せず）が生成される。次に、分解レベル１の最低域のサブバンドＬＬ１に、分解側フィルタバンク２００Ｓを２回適用することで、分解レベル２のサブバンドＨＨ２，ＨＬ２，ＬＨ２，ＬＬ２（図示せず）が生成される。そして、分解レベル２の最低域のサブバンドＬＬ２に、分解側フィルタバンク２００Ｓを２回適用することで、分解レベル３のサブバンドＨＨ３，ＨＬ３，ＬＨ３，ＬＬ３が生成される。
【０００８】
逆に、分解レベル３のサブバンドを合成する逆ＤＷＴの処理手順は次の通りである。先ず、サブバンドＨＨ３，ＨＬ３，ＬＨ３，ＬＬ３に、合成側フィルタバンク２００Ａを２回適用することで、分解レベル２の最低域のサブバンドＬＬ２が生成される。次に、分解レベル２のサブバンドＨＨ２，ＨＬ２，ＬＨ２，ＬＬ２に、合成側フィルタバンク２００Ａを２回適用することで、分解レベル１の最低域のサブバンドＬＬ１が生成される。そして、分解レベル１のサブバンドＨＨ１，ＨＬ１，ＬＨ１，ＬＬ１に、合成側フィルタバンク２００Ａを２回適用することで、２次元画像が生成される。
【０００９】
以上、３次の分解レベルの例を示したが、ＪＰＥＧ２０００方式では、一般に、３次〜８次以上の分解レベルが採用される。また、本例では、１枚の静止画像全体に一括してＤＷＴを施したが、実際には、実装メモリ容量などの関係上、１枚の静止画像を複数の矩形状の「タイル」と称する領域に分割し、各タイル単位でＤＷＴを実行することも行われている。
【００１０】
一方、ＤＷＴおよび逆ＤＷＴはリフティング構成で実現することも可能である。本発明は、合成側の処理に関するものであるので、ここからは逆ＤＷＴの処理について説明する。公知の９×７タップのDaubechiesフィルタの場合、入力データＹ（２ｎ），Ｙ（２ｎ＋１），Ｙ（２ｎ＋２）（ｎ：整数）などと、出力データＸ（２ｎ），Ｘ（２ｎ＋１）との間の関係式は、次式（１）で規定するリフティング構成で表現できる。なお、合成側の処理は逆ＤＷＴであることから、この後の説明全般にわたって、入力データにＹを出力データにＸの文字を使用することとする。
【００１１】
【数１】

【００１２】
上式（１）中、奇数番目の入力データＹ（２ｎ＋１）は分解処理によって得られた高域成分のデータを示し、偶数番目の入力データＹ（２ｎ）は分解処理によって得られた低域成分のデータを示している。そして、出力データＸ（２ｎ）およびＸ（２ｎ＋１）が高域成分と低域成分とが合成されたデータを示している。また、係数α，β，γ，δはリフティング係数と呼ばれ、係数κ，１／κは規格化係数と呼ばれており、これら係数α，β，γ，δ，κ，１／κは、９×７タップのDaubechiesフィルタのフィルタ係数によって一意に導出される。
【００１３】
上式（１）で規定されるリフティング構成は、図３７に示す格子構造で表現することが可能である。図３７の左端の縦一列に並ぶ格子点は、それぞれ、入力データ…，Ｙ（２ｎ−１），（２ｎ），…，Ｙ（２ｎ＋９），Ｙ（２ｎ＋１０），…（ｎ：整数）を表している。つまり、ＤＷＴによって分解された低域成分および高域成分のデータが交互に並んで配列されたデータである。また、これら入力データから水平方向右方に延びる線分の右端の格子点は、それぞれ、出力データ…，Ｘ（２ｎ−１），Ｘ（２ｎ），…，Ｘ（２ｎ＋９），Ｘ（２ｎ＋１０），…を表している。
【００１４】
また、各入力データＹ（ｋ）（ｋ：整数）を示す格子点から、出力データＸ（ｋ）を示す格子点まで延びる線分上の複数の格子点は、一系列の中間データを表している。例えば、入力データＹ（２ｎ）と出力データＸ（２ｎ）との間の線分上には、入力データＹ（２ｎ）を始点として生成された中間データＳ¹ _n，Ｓ² _nを表す格子点が存在している。
【００１５】
この格子構造に基づく演算は次の（Ａ）〜（Ｃ）の規則に従って行われる。（Ａ）格子点を表すデータは、当該格子点から右方へ延びる線分に沿って移動する。（Ｂ）各線分を移動するデータは、当該線分に付した係数を乗算される（係数乗算処理）。（Ｃ）各格子点では、線分に沿って左方から移動してきたデータが加算される（加算処理）。例えば、入出力データＹ（２ｎ），Ｘ（２ｎ）間の線分上の中間データＳ² _nは、Ｓ² _n＝１×Ｓ¹ _n−δ×Ｄ¹ _n-1−δ×Ｄ¹ _n、のように算出される。この式は、上式（１）中の［ｓｔｅｐ３］に相当するものである。
【００１６】
図３７に示すように、例えば、中間データＳ² _nは、図面左方の３つの格子点Ｄ¹ _n-1，Ｓ¹ _n，Ｄ¹ _nから遷移したデータを加算したものである。全ての中間データが、当該中間データよりも左方の３つの格子点から遷移した３点のデータを加算することで算出されることが分かる。ＪＰＥＧ２０００方式は、１点の中間データの算出処理を２工程に分けて行うことを推奨している（"Mathias Larsson Carlander, Media Lab, Ericsson Research, Sweden, JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165, 28 June, 2001"の文献参照）。図３８は、そのＪＰＥＧ２０００方式が推奨する算出方法を模式的に示す図である。格子点ｘ₁，ｘ₂，ｘ₃，ｙがデータを表しており、α，β，γは、各格子点間を結ぶ線分に付した係数を表している。図示するように、データｙは、ステップａで一時データｚを算出した後に、ステップｂで算出されることが分かる。
【００１７】
【非特許文献１】
マシアス・ラーソン・カーランダー（Mathias Larsson Carlander）著，メディアラボ，エリクソン研究所，スエーデン（Media Lab, Ericsson Research, Sweden），「JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165」，２００１年６月２８日。
【００１８】
【発明が解決しようとする課題】
しかしながら、前述のＪＰＥＧ２０００方式が推奨するリフティング演算では、以下に説明するように、１点の出力データを算出するために要する処理時間が長いという問題がある。
【００１９】
図３９〜図４８は、リフティング構成によるＤＷＴ逆変換の処理手順の例を説明するための格子図である。なお、図示しないが、各格子点間を結ぶ全ての線分には、図３７に示した係数が対応付けされているものとする。また、図３９〜図４８では、黒く塗りつぶした格子点は、入力済み或いは計算済みのデータ点を表し、上半分だけ塗りつぶした格子点は、上記ステップａの処理だけが終了した一時データの点を表し、空白の格子点は、ステップａとステップｂの何れの処理もなされていない未計算の点を表している。これら各図に示す処理は、何れも、１クロック周期内に実行される。
【００２０】
図３９に示すＮ回目（Ｎ：整数）の処理では、対象領域Ｎ１内の入力データＹ（２ｎ＋４）を規格化することで、偶数番目の入力データＹ（２ｎ＋４）を始点とする第１段階の中間データＳ¹ _n+2が算出される。
【００２１】
図４０〜図４３に示すＮ＋１回目〜Ｎ＋４回目の処理では、全て、上記ステップａが実行される。Ｎ＋１回目処理（図４０）では、対象領域Ａ１内の２点の中間データＳ¹ _n+2，Ｄ¹ _n+1を用いた処理により、偶数番目の入力データＸ（２ｎ＋４）を始点とする第２段階の一時データ（Ｓ² _n+2）が算出される（このように、一時データを表す場合には、データをカッコで括って区別することとする。）。次のＮ＋２回目処理（図４１）では、対象領域Ａ２内の２点の中間データＤ¹ _n+1，Ｓ² _n+1を用いた処理により、奇数番目の入力データＹ（２ｎ＋３）を始点とする第２段階の一時データ（Ｄ² _n+1）が算出される。次のＮ＋３回目処理（図４２）では、対象領域Ａ３内の２点の中間データＳ² _n+1，Ｄ² _nを用いた処理により、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋２））が算出される。そして、Ｎ＋４回目処理（図４３）では、対象領域Ａ４内の２点のデータＤ² _n，Ｘ（２ｎ）を用いた処理により、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋１））が算出される。
【００２２】
次のＮ＋５回目の処理（図４４）では、対象領域Ｎ２内の入力データＹ（２ｎ＋５）を規格化することで、奇数番目の入力データＹ（２ｎ＋５）を始点とする系列上の第一段階の中間データＤ¹ _n+2が算出される。
【００２３】
次に、図４５〜図４８に示すＮ＋６回目〜Ｎ＋９回目の処理では、全て、上記ステップｂが実行される。Ｎ＋６回目処理（図４５）では、対象領域Ｂ１内の中間データＤ¹ _n+2と、上記Ｎ＋１回目処理で算出した一時データ（Ｓ² _n+2）とを用いた処理により、中間データＳ² _n+2が算出される。次のＮ＋７回目処理（図４６）では、対象領域Ｂ２内の前記Ｎ＋６回目処理で算出した中間データＳ² _n+2と、上記Ｎ＋２回目処理で算出した一時データ（Ｄ² _n+1）とを用いた処理により、中間データＤ² _n+1が算出される。次のＮ＋８回目処理（図４７）では、対象領域Ｂ３内の前記Ｎ＋７回目処理で算出した中間データＤ² _n+1と、上記Ｎ＋３回目処理で算出した出力一時データ（Ｘ（２ｎ＋２））とを用いた処理により、出力データＸ（２ｎ＋２）が算出される。そして、Ｎ＋９回目処理（図４８）では、対象領域Ｂ４内の前記Ｎ＋８回目処理で算出した出力データＸ（２ｎ＋２）と、上記Ｎ＋４回目処理で算出した出力一時データ（Ｘ（２ｎ））とを用いた処理により、出力データＸ（２ｎ＋１）が算出される。
【００２４】
次に、Ｎ＋１０回目処理（図示せず）では、上記Ｎ回目処理と同様に、入力データＹ（２ｎ＋６）を用いた規格化処理が行なわれ、以降、上記Ｎ＋１回目〜Ｎ＋９回目処理と同様の処理が繰り返し実行される。
【００２５】
このように、高域成分と低域成分とを交互に並べた入力データＹ（２ｎ＋４）およびＹ（２ｎ＋５）を入力することによって合成結果である出力データＸ（２ｎ＋２）およびＸ（２ｎ＋１）を算出するために、Ｎ回目〜Ｎ＋９回目の１０クロック周期が必要であることが分かる。したがって、１点の出力データを算出するために平均して５クロック周期が必要となる。この５クロック周期を更に短縮することで逆ＤＷＴ演算を高速に実行し得る処理方法が求められている。
【００２６】
以上の問題などに鑑みて本発明が解決しようとするところは、リフティング構成に基づくウェーブレット変換を短時間で効率良く実行し得るウェーブレット変換装置およびウェーブレット変換方法を提供する点にある。
【００２７】
【課題を解決するための手段】
上記課題を解決するため、請求項１記載の発明は、リフティング構成に基づいて、帯域分割された高域成分のデータと低域成分のデータとを合成するウェーブレット変換装置であって、制御部と、高域成分および低域成分の一方からなる第１データ列と、その他方からなる第２データ列とが画素単位で交互に配列されて構成される入力データ列を取り込んで合成された出力データ列を算出するフィルタリング部と、を備え、前記フィルタリング部は、前記入力データ列の各々に所定の規格化係数を乗算することで、各入力データを第１段階の中間データへ１点当たり１クロック周期内に変換する単数または複数の規格化処理を実行する規格化手段と、前記規格化手段によって規格化された第１段階の中間データの各々を単数または複数の段階に亘る一系列の中間データへ１点当たり１クロック周期内に変換し、あるいは、最終段階の中間データの各々を出力データへ１点当たり１クロック周期内に変換する単数または複数の変換処理を実行する中間データ変換手段と、を含み、前記制御部は、前記規格化手段および前記中間データ変換手段に、前記単数または複数の規格化処理および前記単数または複数の変換処理を、全ての点の前記出力データが算出されるまで繰り返し実行させ、且つ、繰り返し実行される前記単数または複数の規格化処理および前記単数または複数の変換処理のうち少なくとも２個の処理を１クロック周期内に並列に実行させるように制御する、ことを特徴とする。
【００２８】
請求項２記載の発明は、請求項１記載のウェーブレット変換装置であって、前記規格化手段および前記中間データ変換手段は、前記規格化処理および前記変換処理を並列に実行する。
【００２９】
請求項３記載の発明は、請求項１または請求項２記載のウェーブレット変換装置であって、前記規格化手段は、各入力データに前記規格化係数を乗算する規格化係数乗算器と、前記規格化係数乗算器から出力されたデータを遅延させる遅延器と、を含み、前記中間データ変換手段は、２点の中間データの一方に所定のリフティング係数を乗算するリフティング係数乗算器と、該リフティング係数乗算器から出力されたデータと前記２点の中間データの他方とを加算する加算器とからなる２点演算部と、前記２点演算部から出力されたデータを取り込んで前記制御部から指定された出力先に出力する出力先選択部と、を含み、前記ウェーブレット変換装置は、さらに、メモリ管理部と、前記メモリ管理部の制御によりデータを一時記憶するメモリと、を備え、前記メモリ管理部は、前記出力先選択部から出力された前記データを前記メモリに転送し記憶させるように制御する、ことを特徴とする。
【００３０】
請求項４記載の発明は、請求項３記載のウェーブレット変換装置であって、前記制御部は、前記変換処理として、「前記第２データ列に属する入力データを起点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データに対して１点前の「前記第１データ列に属する入力データを起点とする系列」（以下、第１系列と呼ぶ。）上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の一時データを１点当たり１クロック周期内に算出する第１の変換処理と、前記第１の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第２の変換処理と、第１系列上の第１段階の中間データと、その中間データに対して１点前の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の一時データを１点当たり１クロック周期内に算出する第３の変換処理と、前記第３の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の前記中間データを１点当たり１クロック周期内に算出する第４の変換処理と、第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その中間データの系列に対して１点前の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する第５の変換処理と、前記第５の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する第６の変換処理と、第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その中間データの系列に対して１点前の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の一時データを１点当たり１クロック周期内に算出する第７の変換処理と、前記第７の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の前記中間データを１点当たり１クロック周期内に算出する第８の変換処理と、を全ての点の前記出力データが算出されるまで前記２点演算部に繰り返し実行させるように制御する。
【００３１】
請求項５記載の発明は、請求項４記載のウェーブレット変換装置であって、前記制御部は、前記第１の変換処理および前記第３の変換処理を実行した後に、前記第５の変換処理および前記第７の変換処理を、前記最終段階の前記一時データが算出されるまで前記２点演算部に実行させ、その後、前記第２の変換処理および前記第４の変換処理を実行した後に、前記第６の変換処理および前記第８の変換処理を、前記出力データが算出されるまで前記２点演算部に実行させるように制御する。
【００３２】
請求項６記載の発明は、請求項４記載のウェーブレット変換装置であって、互いに独立に動作する４個の前記２点演算部を備え、前記制御部は、前記変換処理として、前記第２データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記第２の変換処理と、Ｐ−１番目の入力データを始点とする系列上の第２段階の一時データを算出する前記第３の変換処理と、Ｐ−４番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記第６の変換処理と、Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の一時データを算出する前記第７の変換処理と、の４工程を前記各２点演算部に並列に実行させると共に、Ｐ＋２番目の入力データを始点とする系列上の第２段階の一時データを算出する前記第１の変換処理と、前記Ｐ−１番目の入力データを始点とする系列上の第２段階の前記中間データを算出する前記第４の変換処理と、Ｐ−２番目の入力データを始点とする系列上の第Ｍ段階の一時データを算出する前記第５の変換処理と、前記Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の中間データを算出する前記第８の変換処理と、の４個の処理をそれぞれ前記各２点演算部に並列に実行させるように制御する。
【００３３】
請求項７記載の発明は、請求項１または請求項２記載のウェーブレット変換装置であって、前記規格化手段は、各入力データに前記規格化係数を乗算する規格化係数乗算器と、前記規格化係数乗算器から出力されたデータを遅延させる遅延器と、を含み、前記中間データ変換手段は、取り込まれた３点の入力データの中で第１および第２の入力データを加算する第１加算器と、該第１加算器から出力されたデータに所定のリフティング係数を乗算するリフティング係数乗算器と、該リフティング係数乗算器から出力されたデータと第３の入力データとを加算することで中間データを算出する第２加算器とからなる３点演算部と、前記３点演算部から出力された中間データを取り込んで前記制御部から指定された出力先に出力する出力先選択部と、を含み、前記メモリ管理部は、前記出力先選択部から出力された中間データを前記メモリに転送し記憶させるように制御する。
【００３４】
請求項８記載の発明は、請求項７記載のウェーブレット変換装置であって、前記制御部は、前記変換処理として、「前記第２データ列に属する入力データを始点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データの系列に対して１点前後する「前記第１データ列に属する入力データを始点とする系列」（以下、第１系列と呼ぶ。）上の２点の第１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第１の変換処理と、第１系列上の第１段階の中間データと、その第１段階の中間データの系列に対して１点前後する第２系列上の２点の第２段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第２の変換処理と、第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その第Ｍ段階の中間データの系列に対して１点前後する第１系列上の２点の第Ｍ段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する第３の変換処理と、第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その第Ｌ段階の中間データの系列に対して１点前後する第２系列上の２点の第Ｌ＋１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する第４の変換処理と、を全ての点の前記出力データが算出されるまで前記３点演算部に繰り返し実行させるように制御する。
【００３５】
請求項９記載の発明は、請求項８記載のウェーブレット変換装置であって、互いに独立に動作する２個の前記３点演算部を備え、前記制御部は、前記第１データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の前記中間データを算出する前記第２の変換処理と、Ｐ−４番目の入力データを始点とする系列上の第Ｌ＋１段階の前記中間データを算出する前記第４の変換処理と、の２個の処理をそれぞれ前記各３点演算部に並列に実行させるように制御する。
【００３６】
請求項１０記載の発明は、請求項８または請求項９記載のウェーブレット変換装置であって、前記制御部は、前記入力データ列の中でＰ＋３番目（データ番号Ｐは整数）の入力データを始点とする系列上の前記中間データを算出する前記第１の変換処理と、Ｐ−１番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記第３の変換処理と、の２個の処理をそれぞれ前記各３点演算部に並列に実行させるように制御する。
【００３７】
請求項１１記載の発明は、請求項８記載のウェーブレット変換装置であって、前記制御部は、前記第１の変換処理〜前記第４の変換処理を並列にさせるように制御する。
【００３８】
請求項１２記載の発明は、請求項１〜請求項１１の何れか１項に記載のウェーブレット変換装置であって、前記フィルタリング部は、直列に接続される第１フィルタリング部と第２フィルタリング部とから構成されており、前記第１フィルタリング部は、水平方向および垂直方向のうちの一方向に帯域分割されている前記高域成分および前記低域成分のデータを入力し、これらのデータを合成してライン単位で算出し、前記第２フィルタリング部は、前記第１フィルタリング部で算出された合成データに対して処理を実行することで、前記水平方向および前記垂直方向のうちの他方向の合成データを算出する。
【００３９】
請求項１３記載の発明は、リフティング構成に基づいて、帯域分割された高域成分のデータと低域成分のデータとを合成するウェーブレット変換方法であって、（ａ）高域成分および低域成分の一方からなる第１データ列と、その他方からなる第２データ列とが画素単位で交互に配列されて構成される入力データ列から、入力データを選択的に取り込む工程と、（ｂ）前記工程（ａ）で取り込まれた前記入力データの各々に規格化係数を乗算することで第１段階の中間データへ１点当たり１クロック周期内に変換する工程と、（ｃ）第ｍ段階（ｍは１以上の整数）の中間データを第ｍ＋１段階の中間データへ１点当たり１クロック周期内に算出する工程（第ｍ段階の中間データが最終段階の中間データである場合を含む。この場合、第ｍ＋１段階の中間データは出力データである。）と、を備え、前記工程（ｂ）および工程（ｃ）を、全ての点の前記出力データが算出されるまで繰り返し実行し、且つ、繰り返し実行される前記工程（ｂ）および工程（ｃ）を１クロック周期内に並列に実行することを特徴とする。
【００４０】
請求項１４記載の発明は、請求項１３に記載のウェーブレット変換方法であって、前記工程（ｃ）は、（ｃ−１）「前記第２データ列に属する入力データを起点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データに対して１点前の「前記第１データ列に属する入力データを起点とする系列」（以下、第１系列と呼ぶ。）上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−２）前記工程（ｃ−１）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−３）第１系列上の第１段階の中間データと、その中間データに対して１点前の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−４）前記工程（ｃ−３）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−５）第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その中間データの系列に対して１点前の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−６）前記工程（ｃ−５）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−７）第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の前記中間データと、その中間データの系列に対して１点前の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、（ｃ−８）前記工程（ｃ−７）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、を備え、前記工程（ｃ−１）〜工程（ｃ−８）を、全ての点の出力データが算出されるまで繰り返し実行させるように制御する。
【００４１】
請求項１５記載の発明は、請求項１４記載のウェーブレット変換方法であって、前記工程（ｃ−１）および前記工程（ｃ−３）を実行した後に、前記工程（ｃ−５）および前記工程（ｃ−７）を、前記出力データの一時データが算出されるまで実行し、その後、前記工程（ｃ−２）および前記工程（ｃ−４）を実行した後に、前記工程（ｃ−６）および前記工程（ｃ−８）を、前記出力データが算出されるまで実行する。
【００４２】
請求項１６記載の発明は、請求項１４記載のウェーブレット変換方法であって、前記第２データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記工程（ｃ−２）と、Ｐ−１番目の入力データを始点とする系列上の第２段階の一時データを算出する前記工程（ｃ−３）と、Ｐ−４番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記工程（ｃ−６）と、Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の一時データを算出する前記工程（ｃ−７）と、の４工程を前記各２点演算部に並列に実行させると共に、Ｐ＋２番目の入力データを始点とする系列上の第２段階の一時データを算出する前記工程（ｃ−１）と、前記Ｐ−１番目の入力データを始点とする系列上の第２段階の前記中間データを算出する前記工程（ｃ−４）と、Ｐ−２番目の入力データを始点とする系列上の第Ｍ＋１段階の一時データを算出する前記工程（ｃ−５）と、前記Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の中間データを算出する前記工程（ｃ−８）と、の４個の処理をそれぞれ並列に実行させるように制御する。
【００４３】
請求項１７記載の発明は、請求項１３に記載のウェーブレット変換方法であって、前記工程（ｃ）は、（ｃ−１）「前記第２データ列に属する入力データを始点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データの系列に対して１点前後する「前記第１データ列に属する入力データを始点とする系列」（以下、第１系列と呼ぶ。）上の２点の第１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−２）第１系列上の第１段階の中間データと、その中間データの系列に対して１点前後する第２系列上の２点の第２段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−３）第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その第Ｍ段階の中間データの系列に対して１点前後する第１系列上の２点の第Ｍ段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、（ｃ−４）第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その第Ｌ段階の中間データの系列に対して１点前後する第２系列上の２点の第Ｌ＋１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、を備え、前記工程（ｃ−１）〜工程（ｃ−４）を、全ての点の前記出力データが算出されるまで繰り返し実行する。
【００４４】
請求項１８記載の発明は、請求項１７記載のウェーブレット変換方法であって、前記第１データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記工程（ｃ−２）と、Ｐ−４番目の入力データを始点とする系列上の第Ｌ＋１段階の前記中間データを算出する前記工程（ｃ−４）と、の２個の処理をそれぞれ並列に実行させるように制御する。
【００４５】
請求項１９記載の発明は、請求項１７または請求項１８記載のウェーブレット変換方法であって、前記入力データ列の中でＰ＋３番目（データ番号Ｐは整数）の入力データを始点とする系列上の中間データを算出する前記工程（ｃ−１）と、Ｐ−１番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記工程（ｃ−３）と、の２個の処理をそれぞれ並列に実行させるように制御する。
【００４６】
請求項２０記載の発明は、請求項１７記載のウェーブレット変換方法であって、前記工程（ｃ−１）〜工程（ｃ−４）を並列に実行する。
【００４７】
請求項２１記載の発明は、請求項１３〜請求項２０の何れか１項に記載のウェーブレット変換方法であって、低域成分と高域成分に帯域分割された２次元画像データに対して、当該２次元画像データの水平方向および垂直方向のうちの一方向にライン単位で前記工程（ａ）〜工程（ｃ）を適用することによって合成データ列を算出し、この算出された合成データ列に対して、前記水平方向および前記垂直方向のうちの他方向に前記工程（ａ）〜工程（ｃ）を適用する、ウェーブレット変換方法。
【００４８】
【発明の実施の形態】
＜第１の実施形態＞
以下、本発明の第１の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。図１は、第１の実施形態に係るウェーブレット変換装置１の概略構成を示す図である。このウェーブレット変換装置１は、ウェーブレット変換によって分解された高域成分あるいは低域成分のサブバンドのデータを一時的に保持するバッファ８、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）２、第１リングメモリ３Ａ、水平フィルタリング部４Ａ、ラインバッファ回路５、第２リングメモリ３Ｂおよび垂直フィルタリング部４Ｂを備えて構成されている。ここで、第１リングメモリ３Ａ、水平フィルタリング部４Ａ、ラインバッファ回路５、第２リングメモリ３Ｂおよび垂直フィルタリング部４Ｂは、外部供給の画素クロック信号ＰＣＬＫと同期して動作する。
【００４９】
本実施形態では、ＭＭＵ２、水平フィルタリング部４Ａおよび垂直フィルタリング部４Ｂはハードウェアで構成されるが、この代わりに、マイクロプロセッサで実行する命令群を含むコンピュータ・プログラムで構成されてもよい。
【００５０】
このウェーブレット変換装置１に入力したサブバンドのデータはバッファ８に一時的に記憶される。ウェーブレット変換装置１は、サブバンドのデータにラインベースの２次元逆ＤＷＴを１回施す機能を有している。水平フィルタリング部４Ａと垂直フィルタリング部４Ｂとは、ラインバッファ回路５と第２リングメモリ３Ｂとを介して直列に接続されている。後述するように、サブバンドのデータは、水平フィルタリング部４Ａで水平方向にフィルタリングされた後に、垂直フィルタリング部４Ｂで垂直方向にフィルタリングされる。２次以上の分解レベルのデータに対して２次元逆ＤＷＴを実行する場合、このウェーブレット変換装置１を２回以上繰り返し利用すればよい。
【００５１】
ＭＭＵ２は、バッファ８と第１リングメモリ３Ａと第２リングメモリ３Ｂとのデータ入出力を制御する機能を有しており、バッファ８から読出した入力データを第１リングメモリ３Ａに転送し記憶させることができる。水平フィルタリング部４Ａは、第１リングメモリ３Ａから入力したデータに対して水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの８クロック周期で、その水平方向の高域成分と同方向の低域成分とを合成した出力データを２点算出してラインバッファ回路５に出力できる。よって、１点の出力データを算出するのに要する平均周期は４クロック周期である。
【００５２】
ラインバッファ回路５から出力されたデータは、第２リングメモリ３Ｂに記憶される。ＭＭＵ２は、この第２リングメモリ３Ｂから垂直フィルタリング部４Ｂに入データを入力させる。垂直フィルタリング部４Ｂは、入力データに対して垂直方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの８周期で、その垂直方向の高域成分と同方向の低域成分とを合成した出力データを２点算出し、出力する。
【００５３】
水平フィルタリング部４Ａの構成と垂直フィルタリング部４Ｂの構成は互いに同一である。図２に、フィルタリング部４（水平フィルタリング部４Ａまたは垂直フィルタリング部４Ｂ）の概略構成を示す。図２に示すリングメモリ３は、図１に示した第１リングメモリ３Ａと第２リングメモリ３Ｂとの何れか一方を表すものとする。
【００５４】
このフィルタリング部４は、入力データを選択的に取り込む第１データ・セレクタ１１、第１係数乗算器１２、遅延レジスタ１６、第２データ・セレクタ１７、第２係数乗算器１８、加算器２２、出力先選択部（ＤＭＵＸ）２３および制御部２４を備えて構成される。制御部２４は、画素クロック信号ＰＣＬＫと同期して動作する。第１，第２データ・セレクタ１１，１７は、この制御部２４から供給される選択制御信号ＳＥＬ０，ＳＥＬ１の値に応じて、リングメモリ３で取り込んだ入力データや遅延レジスタ１６に保持されたデータをそれぞれ第１端子Ｓ０と第２端子Ｓ１とに出力する。
【００５５】
また、第１係数乗算器１２は、制御部２４から供給される制御信号Ｃ０に応じて、第１データ・セレクタ１１の第１端子Ｓ０から出力されたデータに規格化係数κ，１／κの何れかを乗算して出力する（規格化処理）。第１係数乗算器１２から出力されたデータは、遅延レジスタ１６で画素クロック信号ＰＣＬＫの１クロック周期遅延した後に、第２データ・セレクタ１７に入力される。なお、第１係数乗算器１２と遅延レジスタ１６とで本発明の規格化手段が構成される。
【００５６】
また、第２係数乗算器１８は、制御部２４から供給される制御信号Ｃ１に応じて、第２データ・セレクタ１７の第１端子Ｓ０から出力されたデータにリフティング係数−α，−β，−γ，−δの何れかを乗算して出力する（係数乗算処理）。加算器２２は、第２係数乗算器１８から出力されたデータと、第２データ・セレクタ１７の第２端子Ｓ１から出力されたデータとを加算して出力先選択部２３に出力する（加算処理）。また、１／κで規格化処理されたデータは第２データ・セレクタ１７の第３端子Ｓ２から外部のＭＭＵ２にも出力される。ＭＭＵ２は、第２データ・セレクタ１７の第３端子Ｓ２からそれぞれ外部へ出力されたデータをリングメモリ３に転送し記憶させることができる。
【００５７】
また、出力先選択部２３は、制御部２４から供給される選択制御信号ＳＥＬ２の値に応じて、加算器２２から入力するデータを第１端子Ｋ０〜第３端子Ｋ２のいずれかから出力する。第２係数乗算器１８と加算器２２での係数乗算処理と加算処理は、１点当たり１クロック周期内に実行される。したがって、１点の入力データにリフティング係数を乗算し加算するのに要する期間は、画素クロック信号ＰＣＬＫの１周期である。
【００５８】
なお、係数レジスタ１９と加算器２２とはデータ・セレクタ１７から２点の入力データを取り込んで演算する２点演算部を構成する。また、この２点演算部と出力先選択部２３とで中間データ算出手段が構成される。
【００５９】
出力先選択部２３の第１端子Ｋ０および第２端子Ｋ１から出力されたデータは、低域成分と高域成分の入力データが合成された出力データとして外部へ出力される。
【００６０】
また、出力先選択部２３の第２端子Ｋ１から出力されたデータは分岐して外部のＭＭＵ２にも出力される。また、第３端子Ｋ２から出力されるデータは外部のＭＭＵ２に出力される。ＭＭＵ２は、第２端子Ｋ１と第３端子Ｋ２からそれぞれ外部へ出力されたデータをリングメモリ３に転送し記憶させることができる。
【００６１】
次に、以上のフィルタリング部４を用いたリフティング演算の代表例を、図３〜図１０を参照しつつ以下に説明する。図３〜図１０は、９×７タップのDaubechiesフィルタのリフティング構成を模式的に示す格子図である。この格子図の演算は、図３７の場合と同様に行われる。なお、図３〜図１０では、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【００６２】
図３〜図１０に示す通り、入力データ…，Ｙ（２ｎ−１），Ｙ（２ｎ），…，Ｙ（２ｎ＋９），…は、それぞれ、複数の段階に亘る一系列の格子点（中間データ）に変換され、出力データ…，Ｘ（２ｎ−１），Ｘ（２ｎ），…，Ｘ（２ｎ＋９），として出力される。例えば、入力データＹ（２ｎ）は、２段階の中間データ（格子点）を経た後、出力データＸ（２ｎ）として出力される。以下、入力データを規格化して中間データを生成する処理を規格化処理（上記数式（１）中のstep１およびstep２が該当する）と呼び、その他の中間データを算出する処理を変換処理（上記step３およびstep４が該当する）と呼ぶ。なお、本実施形態や後述する他の実施形態では、各系列で２段階の中間データが算出されるが、本発明ではこれに限らず、１段階だけの中間データを算出するリフティング構成もあり得る。実際に、５×３タップや１３×７タップのフィルタの場合、１段階だけの中間データを算出するリフティング構成が可能である。
【００６３】
図３〜図１０は、本実施形態でのＮ回目（Ｎは整数）〜Ｎ＋７回目の処理の内容を示している。Ｎ回目処理（図３）では、対象領域Ａ１内の２点の中間データＳ¹ _n+2とＤ¹ _n+1とを用いた上記ステップａ（図３８）の２点演算を画素クロック信号ＰＣＬＫの１周期（１クロック周期）内に実行して、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の第２段階の一時データ（Ｓ² _n+2）を算出する。すなわち、偶数番目の中間データＳ¹ _n+2と、この中間データＳ¹ _n+2に対して１点前の系列上の奇数番目の中間データＤ¹ _n+1とを用いて上記ステップａの処理が実行される。
【００６４】
また、対象領域Ａ１における演算処理の１クロック前の周期において、対象領域Ｎ１においては、入力データＹ（２ｎ＋４）に規格化係数κを乗算する規格化処理を実行して、入力データＹ（２ｎ＋４）の系列上の第１段階の中間データＳ¹ _n+2が算出されている。
【００６５】
このＮ回目の具体的な処理の内容は次の通りである。図２に示すリングメモリ３は、入力データや中間データや一時データを格納する５ライン（系列）の記憶領域を備えており、参照済みの古いデータを格納する記憶領域に新たなデータを順番に上書きする構造を持つ。
【００６６】
まず、１クロック前の周期で実行される対象領域Ｎ１における処理から説明する。ＭＭＵ２は、このリングメモリ３に一時記憶された入力データＹ（２ｎ＋４）を第１データ・セレクタ１１に出力させる。制御部２４は、選択制御信号ＳＥＬ０を第１データ・セレクタ１１に供給して、入力データＹ（２ｎ＋４）を第１係数乗算器１２に出力させる。第１係数乗算器１２は、制御部２４から供給された制御信号Ｃ０に従って選択した規格化係数κを乗算器１４に出力し、乗算器１４は、入力データＹ（２ｎ＋４）に規格化係数κを乗算して得たデータκ×Ｙ（２ｎ＋４）＝Ｓ¹ _n+2を遅延レジスタ１６に出力する。この第１係数乗算器１２での係数乗算処理は１クロック周期内に実行される。
【００６７】
１クロック周期後、遅延レジスタ１６に保持された中間データＳ¹ _n+2が第２データ・セレクタ１７に出力される。また、ＭＭＵ２は、リングメモリ３に一時記憶された中間データＤ¹ _n+1を第１データ・セレクタ１１に出力させる。第１データ・セレクタ１１は、制御部２４から供給された選択制御信号ＳＥＬ０に応じて、第２端子Ｓ１から中間データＤ¹ _n+1を出力する。出力されたデータは、第２データ・セレクタ１７に入力される。第２データ・セレクタ１７は、制御部２４から供給される選択制御信号ＳＥＬ１に応じて、中間データＤ¹ _n+1を第１端子Ｓ０から第２係数乗算器１８に出力し、中間データＳ¹ _n+2を第２端子Ｓ１から加算器２２に出力させる。
【００６８】
第２係数乗算器１８は、制御部２４から供給された制御信号Ｃ１に従って選択したリフティング係数δを乗算器２０に出力し、乗算器２０は、中間データＤ¹ _n+1にリフティング係数δを乗算して得たデータδ×Ｄ１ｎ＋１を２の補数演算回路２１に出力する。２の補数演算回路２１は、入力データの符号を反転する演算回路であり、−δ×Ｄ¹ _n+1を加算器２２に出力する。そして、加算器２２は、２点のデータ−δ×Ｄ¹ _n+1とＳ¹ _n+2とを加算することで一時データ（Ｓ² _n+2）を算出し、出力先選択部２３に出力する。この一時データ（Ｓ² _n+2）の算出処理は１クロック周期内に実行される。
【００６９】
出力先選択部２３は、制御部２４から供給された選択制御信号ＳＥＬ２の値に従って選択した第３端子Ｋ２から、外部のＭＭＵ２に一時データ（Ｓ² _n+2）を出力する。ＭＭＵ２は、その一時データ（Ｓ² _n+2）をリングメモリ３に転送し、参照済みの記憶領域入力データＹ（２ｎ＋４）に上書きさせる。
【００７０】
次の第Ｎ＋１回目処理（図４）では、対象領域Ａ２内の２点の中間データＤ¹ _n ₊₁とＳ² _n+1とを用いた上記ステップａの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の第２段階の一時データ（Ｄ² _n+1）を算出する。中間データＳ² _n+1は、入力データＹ（２ｎ＋３）に対して１点前の入力データＹ（２ｎ＋２）を始点とする系列上の第２段階のデータである。具体的には、ＭＭＵ２は、リングメモリ３から、計算済みの中間データＤ¹ _n+1とＳ² _n+1とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、中間データＤ¹ _n+1とＳ² _n+1をそれぞれ第２，３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＳ² _n+1を第１端子Ｓ０から第２係数乗算器１８に、中間データＤ¹ _n+1を第２端子Ｓ１から加算器２２に出力する。
【００７１】
第２係数乗算器１８は、制御部２４から供給された制御信号Ｃ１に従って選択したリフティング係数γを乗算器２０に出力し、乗算器２０は、中間データＳ² _n+1にリフティング係数γを乗算して得たデータγ×Ｓ２ｎ＋１を２の補数演算回路２１に出力する。そして、加算器２２は、２の補数演算回路２１の出力データである−γ×Ｓ² _n+1と第２データ・セレクタ１７からの出力であるＤ¹ _n+1を加算することで一時データ（Ｄ² _n+1）を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その一時データ（Ｄ² _n+1）を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その一時データ（Ｄ² _n+1）をリングメモリ３に転送し、参照済みの記憶領域中間データＤ^１ _n+1に上書きさせる。
【００７２】
次の第Ｎ＋２回目処理（図５）では、対象領域Ａ３内の２点の中間データＳ² _n+1とＤ² _nとを用いた上記ステップａの２点演算を１クロック周期内に実行して、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋２））を算出する。中間データＤ² _nは、入力データＹ（２ｎ＋２）に対して１点前の入力データＹ（２ｎ＋１）を始点とする系列上の第２段階の中間データである。具体的には、ＭＭＵ２は、リングメモリ３から、計算済みの中間データＳ² _n+1とＤ² _nとを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、中間データＳ² _n+1とＤ² _nをそれぞれ第２，３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＤ² _nを第１端子Ｓ０から第２係数乗算器１８に、中間データＳ² _n+1を第２端子Ｓ１から加算器２２に出力する。
【００７３】
第２係数乗算器１８は、中間データＤ² _nにリフティング係数βを乗算するとともに、リフティング係数βによって重み付けられたβ×Ｄ２ｎを２の補数演算回路２１に出力する。そして、加算器２２は、２の補数演算回路２１の出力データである−β×Ｄ² _nと第２データ・セレクタ１７からの出力であるＳ² _n+1を加算することで出力一時データ（Ｘ（２ｎ＋２））を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その一時データ（Ｘ（２ｎ＋２））を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その一時データ（Ｘ（２ｎ＋２））をリングメモリ３に転送し、参照済みの記憶領域中間データＳ² _n+1に上書きさせる。
【００７４】
次の第Ｎ＋３回目処理（図６）では、対象領域Ａ４内の中間データＤ² _nと出力データＸ（２ｎ）とを用いた上記ステップａの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋１））を算出する。具体的には、ＭＭＵ２は、リングメモリ３から、計算済みの中間データＤ² _nと出力データＸ（２ｎ）とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、中間データＤ² _nとＸ（２ｎ）とをそれぞれ第２，３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、Ｘ（２ｎ）を第１端子Ｓ０から第２係数乗算器１８に、中間データＤ² _nを第２端子Ｓ１から加算器２２に出力する。
【００７５】
第２係数乗算器１８は、Ｘ（２ｎ）にリフティング係数αを乗算するとともに、リフティング係数αによって重み付けられたデータα×Ｘ（２ｎ）を２の補数演算回路２１に出力する。そして、加算器２２は、２の補数演算回路２１の出力である−α×Ｘ（２ｎ）と第２データ・セレクタ１７からの出力であるＤ² _nとを加算することで出力一時データ（Ｘ（２ｎ＋１））を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その一時データ（Ｘ（２ｎ＋１））を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その一時データ（Ｘ（２ｎ＋１））をリングメモリ３に転送し、参照済みの記憶領域中間データＤ² _nに上書きさせる。
【００７６】
次のＮ＋４回目処理（図７）では、上記Ｎ回目処理（図３）で算出した一時データ（Ｓ² _n+2）と対象領域Ｂ１内の中間データＤ¹ _n+1とを用いた上記ステップｂ（図３８）の２点演算を１クロック周期内に実行して、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の第２段階の中間データＳ² _n+2を算出する。中間データＤ¹ _n+2は、一時データ（Ｓ² _n+2）の系列に対して１点後の系列上のデータである。
【００７７】
また、この対象領域Ｂ１における演算処理を実行する１クロック前の周期において、対象領域Ｎ２では、入力データＹ（２ｎ＋５）に規格化係数１／κを乗算する規格化処理が実行される。これにより、入力データＹ（２ｎ＋５）の系列上の第１段階の中間データＤ¹ _n+2が算出されている。
【００７８】
具体的な処理を１クロック前の周期から説明する。対象領域Ｂ１の演算を行なう１クロック前の周期において、ＭＭＵ２は、リングメモリ３に一時記憶された入力データＹ（２ｎ＋５）を第１データ・セレクタ１１に出力させる。制御部２４は、選択制御信号ＳＥＬ０を第１データ・セレクタ１１に供給して、入力データＹ（２ｎ＋５）を第１係数乗算器１２に出力させる。第１係数乗算器１２は、制御部２４の制御に従い入力データＹ（２ｎ＋５）に規格化係数１／κを乗算し、得られたデータ（１／κ）×Ｙ（２ｎ＋５）＝Ｄ¹ _n+2を遅延レジスタ１６に出力する。この第１係数乗算器１２での係数乗算処理は１クロック周期内に実行される。
【００７９】
１クロック周期後、遅延レジスタ１６に保持された中間データＤ¹ _n+2が第２データ・セレクタ１７に出力される。また、ＭＭＵ２は、リングメモリ３に一時記憶された一時データ（Ｓ² _n+2）を第１データ・セレクタ１１に出力させる。制御部２４は、選択制御信号ＳＥＬ０を第１データ・セレクタ１１に供給して、一時データ（Ｓ² _n+2）を第２データ・セレクタ１７に出力させる。
【００８０】
そして、制御部２４は、選択制御信号ＳＥＬ１を第２データ・セレクタ１７に供給して、中間データＤ¹ _n+2を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｓ² _n+2）を第２端子Ｓ１から加算器２２に出力させる。さらに、第２データ・セレクタ１７は、制御部２４の制御により、中間データＤ¹ _n+2を第３端子Ｓ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その中間データＤ¹ _n+2をリングメモリ３に転送し、参照済みの記憶領域入力データＹ（２ｎ＋５）に上書きさせる。
【００８１】
第２係数乗算器１８は、制御部２４から供給された制御信号Ｃ１に従って選択したリフティング係数δを乗算器２０に出力し、乗算器２０は、中間データＤ¹ _n+2にリフティング係数δを乗算して得たデータδ×Ｄ１ｎ＋２を２の補数演算回路２１に出力し、２の補数演算回路２１はデータ−δ×Ｄ¹ _n+2を加算器２２に出力する。そして、加算器２２は、２点のデータ−δ×Ｄ¹ _n+2と一時データ（Ｓ² _n+2）とを加算することで中間データＳ² _n+2を算出し、出力先選択部２３に出力する。この中間データＳ² _n+2の算出処理は１クロック周期内に実行される。
【００８２】
出力先選択部２３は、制御部２４から供給された選択制御信号ＳＥＬ２の値に従って選択した第３端子Ｋ２から、外部のＭＭＵ２に中間データＳ² _n+2を出力する。ＭＭＵ２は、その中間データＳ² _n+2をリングメモリ３に転送し、参照済みの記憶領域一時データ（Ｓ² _n+2）に上書きさせる。
【００８３】
次のＮ＋５回目処理（図８）では、上記Ｎ＋１回目処理（図４）で算出した一時データ（Ｄ² _n+1）と、前記Ｎ＋４回目処理（図７）で算出した対象領域Ｂ１内の中間データＳ² _n+2とを用いた上記ステップｂの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の第２段階の中間データＤ² _n+1を算出する。なお、中間データＳ² _n+2は、一時データ（Ｄ² _n+1）の系列に対して１点後の系列上の第２段階のデータである。
【００８４】
具体的には、ＭＭＵ２は、リングメモリ３から、一時データ（Ｄ² _n+1）と中間データＳ² _n+2とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、一時データ（Ｄ² _n+1）および中間データＳ² _n+2を第２，第３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＳ² _n+2を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｄ² _n+1）を第２端子Ｓ１から加算器２２に出力する。第２係数乗算器１８は、中間データＳ² _n+2にリフティング係数γを乗算するとともに、２の補数演算回路２１において係数の符号を反転させる。加算器２２は、リフティング係数−γを重み付けされた中間データ−γ×Ｓ² _n+2と一時データ（Ｄ² _n+1）とを加算して中間データＤ² _n+1を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その中間データＤ² _n+1を第３端子Ｋ２から外部のＭＭＵ２に出力し、ＭＭＵ２は、その中間データＤ² _n+1をリングメモリ３に転送し、参照済みの記憶領域一時データ（Ｄ² _n+1）に上書きさせる。
【００８５】
次のＮ＋６回目処理（図９）では、上記Ｎ＋２回目処理（図５）で算出した出力一時データ（Ｘ（２ｎ＋２））と、前記Ｎ＋５回目処理（図８）で算出した対象領域Ｂ２内の中間データＤ² _n+1とを用いた上記ステップｂの２点演算を１クロック周期内に実行して、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力データＸ（２ｎ＋２）を算出する。なお、中間データＤ² _n+1は、出力一時データ（Ｘ（２ｎ＋２））の系列に対して１点後の系列上の第２段階の中間データである。
【００８６】
具体的には、ＭＭＵ２は、リングメモリ３から、一時データ（Ｘ（２ｎ＋２））と中間データＤ² _n+1とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、一時データ（Ｘ（２ｎ＋２））および中間データＤ² _n+1を第２，第３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、中間データＤ² _n+1を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｘ（２ｎ＋２））を第２端子Ｓ１から加算器２２に出力する。第２係数乗算器１８は、中間データＤ² _n+1にリフティング係数βを乗算するとともに、２の補数演算回路２１において係数の符号を反転させる。加算器２２は、リフティング係数−βを重み付けされた中間データ−β×Ｄ² _n+1と一時データ（Ｘ（２ｎ＋２））とを加算して出力データＸ（２ｎ＋２）を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その出力データＸ（２ｎ＋２）を第２端子Ｋ１から外部と外部のＭＭＵ２に出力し、ＭＭＵ２は、その出力データＸ（２ｎ＋２）をリングメモリ３に転送し、参照済みの記憶領域一時データ（Ｘ（２ｎ＋２））に上書きさせる。
【００８７】
次のＮ＋７回目処理（図１０）では、上記Ｎ＋３回目処理（図６）で算出した出力一時データ（Ｘ（２ｎ＋１））と、前記Ｎ＋６回目処理（図９）で算出した対象領域Ｂ４内の出力データＸ（２ｎ＋２）とを用いた上記ステップｂの２点演算を１クロック周期内に実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）を算出する。なお、出力データＸ（２ｎ＋２）は、出力一時データ（Ｘ（２ｎ＋１））の系列に対して１点後の系列上の出力データである。
【００８８】
具体的には、ＭＭＵ２は、リングメモリ３から、一時データ（Ｘ（２ｎ＋１））と出力データＸ（２ｎ＋２）とを第１データ・セレクタ１１に出力させる。次に、制御部２４の制御により、第１データ・セレクタ１１は、一時データ（Ｘ（２ｎ＋１））および出力データＸ（２ｎ＋２）を第２，第３端子Ｓ１，Ｓ２から第２データ・セレクタ１７に出力する。さらに、制御部２４の制御により、第２データ・セレクタ１７は、出力データＸ（２ｎ＋２）を第１端子Ｓ０から第２係数乗算器１８に出力し、一時データ（Ｘ（２ｎ＋１））を第２端子Ｓ１から加算器２２に出力する。第２係数乗算器１８は、出力データＸ（２ｎ＋２）にリフティング係数αを乗算するとともに、２の補数演算回路２１において係数の符号を反転させる。加算器２２は、リフティング係数−αを重み付けされた出力データ−α×Ｘ（２ｎ＋２）と一時データ（Ｘ（２ｎ＋１））とを加算して出力データＸ（２ｎ＋１）を算出し、出力先選択部２３に出力する。出力先選択部２３は、制御部２４の制御により、その出力データＸ（２ｎ＋１）を第１端子Ｋ０から外部に出力する。
【００８９】
次のＮ＋８回目処理（図示せず）では、対象領域を除いて上記Ｎ回目処理（図３）と同様の処理が行われる。以降、Ｎ＋１回目〜Ｎ＋７回目までの処理が繰り返される。以上のように、上記Ｎ回目処理（図３）〜上記Ｎ＋７回目処理（図１０）と同様の処理が、全ての点の出力データ…，Ｘ（２ｎ−１），Ｘ（２ｎ），…が算出されるまで対象領域を移動させつつ実行される。
【００９０】
また、本実施形態では、上記Ｎ回目〜Ｎ＋３回目の処理で示したように、最終段階の出力一時データ（Ｘ（２ｎ＋１））が算出されるまで上記ステップａの２点演算が実行され、その後、上記Ｎ＋４回目〜Ｎ＋７回目の処理で示したように、上記Ｎ回目〜Ｎ＋３回目の処理で算出した全ての一時データを中間データあるいは出力データに変換する上記ステップｂの２点演算が行われている。
【００９１】
以上のように、本実施形態に係るウェーブレット変換方法では、入力データ…，Ｙ（２ｎ），Ｙ（２ｎ＋１），…を規格化する処理と、規格化された中間データを他の中間データに変換する変換処理とを１クロック周期内に並列に同時実行するため、１点の出力データを算出するのに要する平均周期を４クロック周期とすることができ、出力データの算出周期を短縮化できる。
【００９２】
次に、上記ウェーブレット変換装置１を用いたラインベースの２次元逆ＤＷＴ処理を以下に説明する。
【００９３】
水平フィルタリング部４Ａに入力するサブバンド（帯域成分）は、図１１に示すように、サブバンド２３ＬＬおよび２３ＨＬ、あるいは、サブバンド２３ＬＨおよび２３ＨＨである。ここで、サブバンド２３ＬＬは、水平方向の低域成分（Ｌ）と垂直方向の低域成分（Ｌ）とからなり、サブバンド２３ＨＬは、水平方向の高域成分（Ｈ）と垂直方向の低域成分（Ｌ）とからなり、サブバンド２３ＬＨは、水平方向の低域成分（Ｌ）と垂直方向の高域成分（Ｈ）とからなり、サブバンド２３ＨＨは、水平方向の高域成分（Ｈ）と垂直方向の高域成分（Ｈ）とからなる。
【００９４】
水平フィルタリング部４Ａに入力するサブバンド（帯域成分）が、サブバンド２３ＬＬおよび２３ＨＬ、あるいは、サブバンド２３ＬＨおよび２３ＨＨである場合には、図３〜図１０で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３ＬＬと２３ＨＬの水平方向のデータを交互に配列したデータ、あるいは、サブバンド２３ＬＨと２３ＨＨの水平方向のデータを交互に配列したデータである。そして、サブバンド２３ＬＬと２３ＨＬとからなる入力データに対して水平フィルタリングを施すことにより、水平方向の合成処理が行なわれ、サブバンド２３Ｌが出力される。また、サブバンド２３ＬＨと２３ＨＨとからなる入力データに対して水平フィルタリングを施すことにより、水平方向の合成処理が行なわれ、サブバンド２３Ｈが出力される。図３〜図１０で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、サブバンド２３Ｌあるいはサブバンド２３Ｈの水平方向の１ラインのデータを示している。
【００９５】
次に、垂直フィルタリング部４Ｂが入力するサブバンドは、図１２に示すように、サブバンド２３Ｌおよびサブバンド２３Ｈである。この場合には、図３〜図１０で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３Ｌと２３Ｈの垂直方向のデータを交互に配列したデータである。そして、サブバンド２３Ｌと２３Ｈとからなる入力データに対して垂直フィルタリングを施すことにより、垂直方向の合成処理が行なわれ、画像データ２３が出力される。図３〜図１０で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、画像データ２３の垂直方向の１ラインのデータを示している。画像データ２３は、水平画素数Ｗ、垂直画素数Ｈを有する矩形状のデータである。
【００９６】
サブバンド２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨは、垂直画素数Ｈ／２、水平画素数Ｗ／２を有する矩形状のデータであって、図１２に模式的に示すように、偶数行偶数列のサブバンド２３ＬＬおよび偶数行奇数列のサブバンド２３ＨＬを１組として、あるいは、奇数行偶数列のサブバンド２３ＬＨおよび奇数行奇数列をサブバンド２３ＨＨの１組として、垂直方向に配列するデータ列…，Ｙ_i（２ｎ），Ｙ_i（２ｎ＋１），Ｙ_i（２ｎ＋２）…として水平フィルタリング部４Ａに入力させられる。入力データＹ_i（ｋ）の添字ｉは、当該入力データＹ_i（ｋ）が所属する画素列の番号を示すものとする。画素列の番号ｉは、ｉ＝０，１，…，Ｗ−１（Ｗ：水平画素数）の値をとる。図中、サブバンド２３ＬＬおよび２３ＨＬを１組とした偶数行の記憶領域２４Ｌと、サブバンド２３ＬＨおよびサブバンド２３ＨＨを１組とした奇数行の記憶領域２４Ｈとを２領域に分割しているが、メモリ状のデータ配置はこれに限定されるものではない。
【００９７】
具体的には、第１リングメモリ３Ａと水平フィルタリング部４Ａは、上記Ｎ回目処理（図３）〜上記Ｎ＋７回目処理（図１０）を含む各回の処理を、低域側（記憶領域２４Ｌ側）と高域側（記憶領域２４Ｈ側）を交互に切り替えながら、各回の処理を画素単位で繰り返し実行する。
【００９８】
例えば、上記Ｎ回目処理（図３）が、メモリ領域２４Ｌ側の１番目の画素行に対して１回実行された後に、上記Ｎ＋１回目処理（図４）が１回実行され、更に、上記Ｎ＋２回目処理（図５）が１回実行され、・・・といった処理が行われる。同様に、記憶領域２４Ｈ側の１番目の画素行に対して実行され、次に、記憶領域２４Ｌ側の２番目の画素行に対して実行された後に、記憶領域２４Ｈ側の２番目の画素行に対して実行され、次に、記憶領域２４Ｌ側の３番目の画素行に対して実行された後に、記憶領域２４Ｈ側の３番目の画素行に対して実行され、・・・、最終的に、記憶領域２４Ｌ側のＨ／２番目の画素行に対して実行された後に、記憶領域２４Ｈ側のＨ／２番目の画素行に対して実行される。
【００９９】
なお、第１リングメモリ３Ａは、図１３に模式的に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する５点（５画素）のデータを保持する記憶領域２６を有しており、上記一時データや中間データを保持することができる。
【０１００】
この結果、水平フィルタリング部４Ａからは、サブバンド２３ＬＬと２３ＨＬとが合成されたサブバンド２３Ｌの各水平ライン単位（Ｈ／２高さ）の出力と、サブバンド２３ＬＨと２３ＨＨとが合成されたサブバンド２３Ｈの各水平ライン単位（Ｈ／２高さ）の出力とが、交互にで連続的に出力される。そして、サブバンド２３Ｌの水平ラインが、ラインバッファ回路５内のＬ用ラインバッファ５Ｌにバッファリングされ、サブバンド２３Ｈの水平ラインが、ラインバッファ回路５内のＨ用ラインバッファ５Ｈにバッファリングされる。
【０１０１】
例えば、上記Ｎ＋６回目処理（図９）が１番目〜Ｗ番目の各画素に対して連続的に実行された結果、２ｎ＋２番目の水平成分の合成された１ラインのデータＸ₀（２ｎ＋２），Ｘ₂（２ｎ＋２），…，Ｘ_j（２ｎ＋２），…，Ｘ_ｗ _-1（２ｎ＋２）が連続的に出力され、Ｌ用ラインバッファ回路５Ｌでバッファリングされる。次に、上記Ｎ＋７回目処理（図１０）が１番目〜Ｗ番目の各画素に対して連続的に実行された結果、２ｎ＋１番目の水平成分の合成された１ラインのデータＸ₀（２ｎ＋１），Ｘ₂（２ｎ＋１），…，Ｘ_j（２ｎ＋１），…，Ｘ_ｗ _-1（２ｎ＋１）が連続的に出力され、Ｈ用ラインバッファ回路５Ｈでバッファリングされる。
【０１０２】
ラインバッファ回路５は、ＭＭＵ２の制御により、Ｌ用ラインバッファ５Ｌ内の１水平ラインの成分とＨ用ラインバッファ５Ｈ内の１水平ラインの成分とを１ラインづつ交互に第２リングメモリ３Ｂに供給する。第２リングメモリ３Ｂに出力されたデータが垂直フィルタリング部４Ｂで処理される。
【０１０３】
具体的には、第２リングメモリ３Ｂと垂直フィルタリング部４Ｂは、上記Ｎ回目処理（図３）〜上記Ｎ＋７回目処理（図１０）を含む各画素列について処理を水平ライン単位で繰り返し実行する。例えば、上記Ｎ回目処理（図３）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、次に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。次に、上記Ｎ＋１回目処理（図４）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、更に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。このようにして、各回の処理が全ての画素列について順次実行される。なお、第２リングメモリ３Ｂは、図１２に模式的に示すように、入力データ列に対応する５×Ｗ点（５ライン）のデータを保持する記憶領域２４を有しており、上記一時データや中間データを保持することができる。
【０１０４】
この結果、垂直フィルタリング部４Ｂは、水平ライン単位で入力するデータ行から画像データ２３を出力するのである。
【０１０５】
以上の処理を再帰的に実行させることで、任意次数の分解レベルの帯域成分を合成処理し、画像データを復元することができる。すなわち、ｋ−１次（ｋは２以上の整数）の分解レベルにおけるサブバンドＬＬ（ｋ−１），ＨＬ（ｋ−１），ＬＨ（ｋ−１），ＨＨ（ｋ−１）を、ウェーブレット変換装置１に再帰的に入力させることで、ｋ次のサブバンドＬＬ（ｋ）を得ることが可能である。
【０１０６】
以上のように、本実施形態に係るウェーブレット変換装置１では、図２に示す構成を有する水平フィルタリング部４Ａと垂直フィルタリング部４Ｂとを備えるため、出力データの算出周期を短縮化できる。したがって、ラインベースの２次元ウェーブレット変換を短時間で高速で行うことが可能である。
【０１０７】
＜第２の実施形態＞
次に、本発明の第２の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。図１４は、第２の実施形態に係るウェーブレット変換装置３０の概略構成を示す図である。このウェーブレット変換装置３０は、サブバンドの２次元画像データを一時的に保持するバッファ３４、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）３１、第１リングメモリ３２Ａ、水平フィルタリング部３３Ａ、第２リングメモリ３２Ｂおよび垂直フィルタリング部３３Ｂを備えて構成されている。ここで、第１リングメモリ３２Ａ、水平フィルタリング部３３Ａ、第２リングメモリ３２Ｂおよび垂直フィルタリング部３３Ｂは、外部供給の画素クロック信号ＰＣＬＫと同期して動作する。
【０１０８】
なお、図中、第１および第２リングメモリ３２Ａ，３２Ｂの画素数あるいはライン数が８ｏｒ９となっているが、この第２の実施形態においては、第１リングメモリ３２Ａは、９点のリングメモリであり、第２リングメモリ３２Ｂは、９ラインのリングメモリである。
【０１０９】
本実施形態では、ＭＭＵ３１、水平フィルタリング部３３Ａおよび垂直フィルタリング部３３Ｂはハードウェアで構成されるが、この代わりに、マイクロプロセッサで実行する命令群を含むコンピュータ・プログラムで構成されてもよい。
【０１１０】
このウェーブレット変換装置３０に入力したサブバンドの２次元画像データはバッファ３４に一時的に記憶される。ウェーブレット変換装置３０は、２次元画像データにラインベースの２次元逆ＤＷＴを１回施す機能を有し、ｋ＋１次レベルのサブバンド２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨを合成して、ｋ次のサブバンド２３ＬＬを生成する。水平フィルタリング部３３Ａと垂直フィルタリング部３３Ｂとは、第２リングメモリ３２Ｂを介して直列に接続されている。サブバンドのデータは、水平フィルタリング部３３Ａで水平方向にフィルタリングされた後に、垂直フィルタリング部３３Ｂで垂直方向にフィルタリングされる。２次以上の分解レベルのサブバンドを合成する２次元逆ＤＷＴを実行する場合、このウェーブレット変換装置３０を２回以上繰り返し利用すればよい。
【０１１１】
ＭＭＵ３１は、バッファ３４と第１リングメモリ３２Ａと第２リングメモリ３２Ｂとのデータ入出力を制御する機能を有しており、バッファ３４から読出したサブバンドのデータを第１リングメモリ３２Ａに転送し記憶させることができる。詳しくは、サブバンド２３ＬＬおよび２３ＨＬの水平方向のデータが画素単位で交互に配列されたデータ、および、サブバンド２３ＬＨおよび２３ＨＨの水平方向のデータが画素単位で交互に配列されたデータとが、第１リングメモリ３２Ａに記憶される。水平フィルタリング部３３Ａは、第１リングメモリ３２Ａから入力したデータに対して水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、その高域成分と低域成分とを合成したデータを１点ずつ算出して第２リングメモリ３２Ｂに出力できる。詳しくは、サブバンド２３ＬＬおよび２３ＨＬが合成されたデータと、サブバンド２３ＬＨおよび２３ＨＨが合成されたデータとが交互に出力されて第２リングメモリ３２Ｂに記憶される。
【０１１２】
次に、ＭＭＵ３１は、この第２リングメモリ３２Ｂから垂直フィルタリング部３３Ｂにデータを入力させる。垂直フィルタリング部３３Ｂは、入力したデータに対して垂直方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、高域成分と低域成分とを合成したデータを１点ずつ算出し、出力する。
【０１１３】
水平フィルタリング部３３Ａの構成と垂直フィルタリング部３３Ｂの構成とは互いに同一である。図１５に、フィルタリング部３３（水平フィルタリング部３３Ａまたは垂直フィルタリング部３３Ｂ）の概略構成を示す。図１５に示すリングメモリ３２は、図１４に示した第１リングメモリ３２Ａと第２リングメモリ３２Ｂとの何れか一方を表すものとする。
【０１１４】
このフィルタリング部３３は、入力データを選択的に取り込む第１データ・セレクタ３５、第１係数乗算器３６、遅延レジスタ４０、第２データ・セレクタ４１、第３データ・セレクタ４２、加算器４３，４８，４９，５４、第２係数乗算器４４、第３係数乗算器５０、出力先選択部（ＤＭＵＸ）５５、および制御部５６を備えて構成される。これら構成要素のうち、２個の加算器４３，４８と第２係数乗算器４４からなる組は、３点のデータを１クロック周期内に処理する３点演算部を構成する。また、２個の加算器４９，５４と第３係数乗算器５０からなる組も同様に３点演算部を構成する。また、これら２組の３点演算部と出力先選択部５５とで中間データ算出手段が構成される。
【０１１５】
制御部５６は、画素クロック信号ＰＣＬＫと同期して動作する。第１データ・セレクタ３５は、この制御部５６から供給される選択制御信号ＳＥＬ０の値に応じて、リングメモリ３２で取り込んだデータを第１端子Ｓ０〜第７端子Ｓ６の何れかから選択的に出力する。
【０１１６】
第１データ・セレクタ３５の第１端子Ｓ０から出力されたデータは、第１係数乗算器３６に入力される。第１係数乗算器３６では、係数レジスタ３７は、制御部５６から供給される制御信号Ｃ０の値に応じて、規格化係数１／κ，κの何れか一方を乗算器３８に出力し、乗算器３８は、入力データにその規格化係数を乗算する規格化処理を１クロック周期内に実行する。第１係数乗算器３６から出力されたデータは、遅延レジスタ４０で画素クロック信号ＰＣＬＫの１クロック周期遅延した後に第２データ・セレクタ４１に入力される。なお、第１係数乗算器３６と遅延レジスタ４０とで本発明の規格化手段が構成される。
【０１１７】
３点演算部においては、加算器４３は、第３データ・セレクタ４２の第１端子Ｓ０と第２端子Ｓ１とから出力された２点のデータを加算して第２係数乗算器４４に出力する。第２係数乗算器４４では、係数レジスタ４５は、制御部５６から供給される制御信号Ｃ１の値に応じて、リフティング係数δ，αの何れか一方を入力データに乗算し、２の補数演算回路４７で符号が反転された後、加算器４８に出力される。そして、加算器４８は、第３データ・セレクタ４２の第３端子Ｓ２から入力したデータと、第２係数乗算器４４から入力したデータとを加算して出力先選択部５５に出力する。
【０１１８】
また、加算器４９は、第３データ・セレクタ４２の第４端子Ｓ３と第５端子Ｓ４とから出力された２点のデータを加算して第３係数乗算器５０に出力する。第３係数乗算器５０では、係数レジスタ５１は、制御部５６から供給される制御信号Ｃ２の値に応じて、リフティング係数β，γの何れか一方を入力データに乗算し、２の補数演算回路５３で符号が反転した後、加算器５４に出力される。加算器５４は、第３データ・セレクタ４２の第６端子Ｓ５から入力したデータと、第３係数乗算器５０から入力したデータとを加算して出力先選択部５５に出力する。
【０１１９】
出力先選択部５５は、制御部５６から供給される選択制御信号ＳＥＬ３の値に応じて、加算器４８，５４から並列に入力する２点のデータを第１端子Ｋ０から第３端子Ｋ２のいずれかから出力する。
【０１２０】
また、出力先選択部５５の第２端子Ｋ１から出力されたデータは分岐して外部のＭＭＵ２にも出力され、第３端子Ｋ２から出力されるデータは外部のＭＭＵ２に出力される。ＭＭＵ２は、第２端子Ｋ１と第３端子Ｋ２からそれぞれ外部へ出力されたデータをリングメモリ３２に転送し記憶させることができる。
【０１２１】
次に、以上のフィルタリング部３３を用いたリフティング演算の代表例を、図１６〜図１９を参照しつつ以下に説明する。図１６〜図１９は、９×７タップのDaubechiesフィルタのリフティング構成を模式的に示す格子図である。この格子図の演算は、図３７の場合と同様に行われる。なお、図１６〜図１９は、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【０１２２】
図１６〜図１９は、本実施形態でのＮ回目（Ｎは整数）〜Ｎ＋３回目の処理を模式的に示している。Ｎ回目処理（図１６）では、対象領域Ｃ１，Ｃ２の２個の変換処理が１クロック周期内に並列に同時実行される。対象領域Ｃ２では、２点の出力データＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータにリフティング係数−αを乗算することで乗算値を算出した後、この乗算値と中間データＤ² _nとを加算するという３点演算が実行される。この結果、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）が算出される。２点の出力データＸ（２ｎ），Ｘ（２ｎ＋２）は、中間データＤ² _nの系列に対して１点前後する２系列上のデータである。また、対象領域Ｃ１では、２点の中間データＳ² _n+2，Ｓ² _n+3を加算したデータにリフティング係数−γを乗算することで乗算値を算出した後、この乗算値と中間データＤ¹ _n+2とを加算するという３点演算が実行される。この結果、偶数番目の入力データY（２ｎ＋５）を始点とする系列上の第２段階の中間データD² _n+2が算出される。ここで、２点の中間データＳ² _n+2，Ｓ² _n+3は、中間データD¹ _n+2の系列に対して１点前後する２系列上のデータである。
【０１２３】
また、上記対象領域Ｃ１およびＣ２における演算の１クロック前の周期に、対象領域Ｎ１において、入力データＹ（２ｎ＋８）に規格化係数κを乗算する規格化処理が実行され、入力データＹ（２ｎ＋８）の系列上の第１段階の中間データであるＳ¹ _n+4が算出される。
【０１２４】
このＮ回目の具体的な処理の内容は次の通りである。図１５に示すリングメモリ３２は、入力データや中間データや一時データを格納する９ライン（系列）の記憶領域を備えており、参照済みの古いデータを格納する記憶領域に新たなデータを順番に上書きする構造を持つ。
【０１２５】
ＭＭＵ３１は、このリングメモリ３２に一時記憶された入力データＹ（２ｎ＋８）を第１データ・セレクタ３５に出力させる。制御部５６は、選択制御信号ＳＥＬ０を第１データ・セレクタ３５に供給して、入力データＹ（２ｎ＋８）を第１係数乗算器３６に出力させる。第１係数乗算器３６は、制御部５６から供給された制御信号Ｃ０に従って２個の規格化係数κ，１／κのうち後半の係数κを選択して乗算器３８に供給し、乗算器３８は、入力データと規格化係数κとを乗算した乗算値（＝κ×Ｙ（２ｎ＋８）＝Ｓ¹ _n+4）を遅延レジスタ４０に出力する。この第１係数乗算器３６での係数乗算処理は１クロック周期内に実行される。
【０１２６】
この係数乗算処理から１クロック周期後、遅延レジスタ４０に記憶された中間データＳ１ｎ＋４が第２データ・セレクタ４１に出力される。第２データ・セレクタ４１は、制御部５６から供給される選択制御信号ＳＥＬ１に従って、中間データＳ¹ _n+4を第２端子Ｓ１からＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＳ¹ _n+4をリングメモリ３２に転送し、参照済みの記憶領域入力データＹ（２ｎ＋８）に上書きさせる。また、この中間データＳ¹ _n+4をＭＭＵ３１に出力する周期と同じクロック周期において、ＭＭＵ３１は、リングメモリ３２に一時記憶された６点のデータＸ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｓ² _n+2，Ｄ¹ _n+2，Ｓ² _n+3を第１データ・セレクタ３５に出力させる。第１データ・セレクタ３５は、制御部５６から供給される選択制御信号ＳＥＬ０の値に応じて、前記６点のデータを第２端子Ｓ１〜第７端子Ｓ６に出力する。この出力は、次に、第３データ・セレクタ４２に入力され、第３データ・セレクタ４２は、制御部５６から供給される選択制御信号ＳＥＬ２の値に応じて、入力データのうち対象領域Ｃ２内の３点のデータＸ（２ｎ），Ｘ（２ｎ＋２），Ｄ² _nを選択して、それぞれ第１端子Ｓ０〜第３端子Ｓ２から出力し、入力データのうち対象領域Ｃ１内の３点のデータＳ² _n+2，Ｓ² _n+3,Ｄ¹ _n+2を選択して、それぞれ第４端子Ｓ３〜第６端子Ｓ５から出力する。
【０１２７】
上方の加算器４３は、第３データ・セレクタ４２の第１端子Ｓ０と第２端子Ｓ１から入力した２点のデータＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータを第２係数乗算器４４に出力する。第２係数乗算器４４において、係数レジスタ４５は、制御部５６から供給される制御信号Ｃ１に従って、２個のリフティング係数α，δのうち後半の係数αを選択して乗算器４６に供給し、乗算器４６は、入力データとリフティング係数αとを乗算した乗算値（＝α×（Ｘ（２ｎ）＋Ｘ（２ｎ＋２）））を２の補数演算回路４７に出力する。２の補数演算回路４７において、符号が反転されたデータは、加算器４８に出力される。そして、加算器４８は、第２係数乗算器４４から入力する乗算値と、第３データ・セレクタ４２の第３端子Ｓ２から入力した中間データＤ² _nとを加算することで、対象領域Ｃ２内の出力データＸ（２ｎ＋１）を算出し、出力先選択部５５に出力する。この出力データＸ（２ｎ＋１）の算出処理は１クロック周期内に実行される。
【０１２８】
一方、下方の加算器５０は、第３データ・セレクタ４２の第４端子Ｓ３と第５端子Ｓ４とから入力した２点の中間データＳ² _n+2，Ｓ² _n+3を加算したデータを第３係数乗算器５０に出力する。第３係数乗算器５０では、係数レジスタ５１は、制御部５６から供給される制御信号Ｃ２に従って、２個のリフティング係数β，γのうち後半の係数γを選択して乗算器５２に供給し、乗算器５２は、入力データとリフティング係数γとを乗算した乗算値（＝γ×（Ｓ² _n+2＋Ｓ² _n+3））を２の補数演算回路５３に出力する。２の補数演算回路５３において、符号が反転されたデータは、加算器５４に出力される。そして、加算器５４は、第３係数乗算器５０から入力する乗算値と、第３データ・セレクタ４２の第６端子Ｓ５から入力した中間データＤ¹ _n+2とを加算することで、対象領域Ｃ１内の中間データＤ² _n+2を算出し、出力先選択部５５に出力する。この中間データＤ² _n+2の算出処理は１クロック周期内に実行される。
【０１２９】
出力先選択部５５は、制御部５６から供給された選択制御信号ＳＥＬ３の値に従って、加算器４８から入力した出力データＸ（２ｎ＋１）を第１端子Ｋ０から出力し、他方の加算器５４から入力した中間データＤ² _n+2を第３端子Ｋ２から外部のＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＤ² _n+2をリングメモリ３２に転送し、参照済みの記憶領域中間データＤ¹ _n+2に上書きさせる。
【０１３０】
次に、Ｎ＋１回目処理（図１７）における対象領域Ｃ３，Ｃ４の変換処理が行なわれる。対象領域Ｃ３では、２点の中間データＤ¹ _n+3，Ｄ¹ _n+4を加算したデータにリフティング係数−δを乗算することで乗算値を算出した後、この乗算値と中間データＳ¹ _n+4とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋８）を始点とする系列上の第２段階の中間データＳ² _n+4が算出される。ここで、２点の中間データＤ¹ _n+3，Ｄ¹ _n+4は、中間データＳ¹ _n+4に対して１点前後するデータである。また、対象領域Ｃ４では、２点の中間データＤ² _n+1，Ｄ² _n+2を加算したデータにリフティング係数−βを乗算することで乗算値を算出した後、この乗算値と中間データＳ² _n+2とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の出力データＸ（２ｎ＋４）が算出される。ここで、２点の中間データＤ² _n+1，Ｄ² _n+2は、中間データＳ² _n+2の系列に対して１点前後する２系列上のデータである。
【０１３１】
また、対象領域Ｃ３，Ｃ４における演算処理が実行される１クロック前の周期において、対象領域Ｎ２における処理が実行される。対象領域Ｎ２においては、入力データＹ（２ｎ＋９）に規格化係数１／κを乗算する規格化処理が実行され、中間データＤ¹ _n+4が出力される。
【０１３２】
このＮ＋１回目の具体的な処理内容は次の通りである。まず、１クロック前の周期に実行される対象領域Ｎ２の処理から説明する。ＭＭＵ３１は、このリングメモリ３２に一時記憶された入力データＹ（２ｎ＋９）を第１データ・セレクタ３５に出力させる。制御部５６は、選択制御信号ＳＥＬ０を第１データ・セレクタ３５に供給して、入力データＹ（２ｎ＋９）を第１係数乗算器３６に出力させる。第１係数乗算器３６は、制御部５６から供給された制御信号Ｃ０に従って２個の規格化係数κ，１／κのうち前半の係数１／κを選択して乗算器３８に供給し、乗算器３８は、入力データと規格化係数１／κとを乗算した乗算値（＝１／κ×Ｙ（２ｎ＋９）＝Ｄ¹ _n+4）を遅延レジスタ４０に出力する。この第１係数乗算器３６での係数乗算処理は１クロック周期内に実行される。
【０１３３】
この係数乗算処理から１クロック周期後、遅延レジスタ４０に記憶された中間データＤ１ｎ＋４が第２データ・セレクタ４１に出力される。第２データ・セレクタ４１は、制御部５６から供給される選択制御信号ＳＥＬ１に従って、中間データＤ¹ _n+4を第１端子Ｓ０から第３データ・セレクタ４２に出力し、且つ、中間データＤ¹ _n+4を第２端子Ｓ１からＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＤ¹ _n+4をリングメモリ３２に転送し、参照済みの記憶領域入力データＹ（２ｎ＋９）に上書きさせる。次に、この中間データＤ¹ _n+4を第３データ・セレクタ４２に出力する周期と同じクロック周期において、ＭＭＵ３１は、リングメモリ３２に一時記憶された５点のデータＤ² _n+1，Ｓ² _n+2，Ｄ² _n+2，Ｄ¹ _n+3，Ｓ¹ _n+4を第１データ・セレクタ３５に出力させる。第１データ・セレクタ３５は、制御部５６から供給された選択制御信号ＳＥＬ０の値に応じて、前記５点のデータを第２端子Ｓ１〜第６端子Ｓ５に出力する。この出力は、次に、第３データ・セレクタ４２に入力され、第３データ・セレクタ４２は、前記５点のデータのうち対象領域Ｃ３内の３点の入力データＤ² _n+1，Ｄ² _n+2，Ｓ² _n+2，を選択して第４端子Ｓ３〜第６端子Ｓ５から出力し、前記５点のデータのうち対象領域Ｃ４内の２点のデータおよび第２データ・セレクタ４１から入力したデータＤ¹ _n+3，Ｄ¹ _n+4，Ｓ¹ _n+4を選択して第１端子Ｓ０〜第３端子Ｓ２から出力する。
【０１３４】
上方の加算器４３は、第３データ・セレクタ４２の第１端子Ｓ０と第２端子Ｓ１から入力した２点のデータＤ¹ _n+3，Ｄ¹ _n+4を加算したデータを第１係数乗算器４４に出力する。第１係数乗算器４４において、係数レジスタ４５は、制御部５６から供給される制御信号Ｃ１に従って、２個のリフティング係数α，δのうち前半の係数δを選択して乗算器４６に供給し、乗算器４６は、入力データとリフティング係数δとを乗算した乗算値（＝δ×（Ｄ¹ _n+3＋Ｄ¹ _n+4））を２の補数演算回路４７に出力する。２の補数演算回路４７において、符号が反転されたデータは、加算器４８に出力される。そして、加算器４８は、第２係数乗算器４４から入力する乗算値と、第３データ・セレクタ４２の第３端子Ｓ２から入力した中間データＳ¹ _n+4とを加算することで、対象領域Ｃ３内の中間データＳ² _n+4を算出し、出力先選択部５５に出力する。この中間データＳ² _n+4の算出処理は１クロック周期内に実行される。
【０１３５】
一方、下方の加算器４９は、第３データ・セレクタ４２の第４端子Ｓ３と第５端子Ｓ４とから入力した２点の中間データＤ² _n+1，Ｄ² _n+2を加算したデータを第３係数乗算器５０に出力する。第３係数乗算器５０では、係数レジスタ５１は、制御部５６から供給される制御信号Ｃ２に従って、２個のリフティング係数β，γのうち前半の係数βを選択して乗算器５２に供給し、乗算器５２は、入力データとリフティング係数βとを乗算した乗算値（＝β×（Ｄ² _n+1＋Ｄ² _n+2））を２の補数演算回路５３に出力する。２の補数演算回路５３において、符号が反転されたデータは、加算器５４に出力される。そして、加算器５４は、第３係数乗算器５０から入力する乗算値と、第３データ・セレクタ４２の第６端子Ｓ５から入力した中間データＳ² _n+2とを加算することで、対象領域Ｃ４内の出力データＸ（２ｎ＋４）を算出し、出力先選択部５５に出力する。この出力データＸ（２ｎ＋４）の算出処理は１クロック周期内に実行される。
【０１３６】
出力先選択部５５は、制御部５６から供給された選択制御信号ＳＥＬ３の値に従って、加算器５４から入力した出力データＸ（２ｎ＋４）を第２端子Ｋ１から出力し、他方の加算器４８から入力した中間データＳ² _n+4を第３端子Ｋ２から外部のＭＭＵ３１に出力し、ＭＭＵ３１は、その中間データＳ² _n+4をリングメモリ３２に転送し、参照済みの記憶領域中間データＳ¹ _n+4に上書きさせる。また、第２端子Ｋ１から出力された出力データＸ（２ｎ＋４）は分岐して外部のＭＭＵ３１にも出力され、ＭＭＵ３１は、その出力データＸ（２ｎ＋４）をリングメモリ３２に転送し、参照済みの記憶領域中間データＳ² _n+2に上書きさせる。
【０１３７】
次に、Ｎ＋２回目処理（図１８）における対象領域Ｃ５，Ｃ６の変換処理が実行される。また、対象領域Ｃ５，Ｃ６における演算処理が実行される１クロック前の周期において、対象領域Ｎ３の規格化処理が実行される。ここで、対象領域Ｃ５，Ｃ６，Ｎ３は、それぞれ、上記Ｎ回目処理（図１６）の対象領域Ｃ１，Ｃ２，Ｎ１を２系列（２点）後方に移動した領域である。これら対象領域Ｃ５，Ｃ６，Ｎ３では、対象領域Ｃ１，Ｃ２，Ｎ１での処理と同様の処理が実行される。したがって、対象領域Ｎ３では、偶数番目の入力データＹ（２ｎ＋１０）に規格化係数κを乗算する規格化処理を実行して、中間データＳ¹ _n+5を算出する。また、対象領域Ｃ５では、２点の中間データＳ² _n+3，Ｓ² _n+4を加算したデータにリフティング係数−γを乗算することで乗算値を算出した後、この乗算値と中間データＤ¹ _n+3とを加算するという３点演算が実行される。この結果、奇数番目の入力データＸ（２ｎ＋７）を始点とする系列上の第２段階の中間データＤ² _n+3が算出される。また、対象領域Ｃ６では、２点の出力データＸ（２ｎ＋２），Ｘ（２ｎ＋４）を加算したデータにリフティング係数−αを乗算することで乗算値を算出した後、この乗算値と中間データＤ² _n+1とを加算するという３点演算が実行される。この結果、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の出力データＸ（２ｎ＋３）が算出される。
【０１３８】
次に、Ｎ＋３回目処理（図１９）における対象領域Ｃ７，Ｃ８の変換処理が実行される。また、対象領域Ｃ７，Ｃ８における演算処理が実行される１クロック前の周期において、対象領域Ｎ４の規格化処理が実行される。ここで、対象領域Ｃ７，Ｃ８，Ｎ４は、それぞれ、上記Ｎ＋１回目処理（図１７）の対象領域Ｃ３，Ｃ４，Ｎ２を２系列（２点）後方に移動した領域である。これら対象領域Ｃ７，Ｃ８，Ｎ４では、対象領域Ｃ３，Ｃ４，Ｎ２での処理と同様の処理が実行される。したがって、対象領域Ｎ４では、入力データＹ（２Ｎ＋１１）に規格化係数１／κを乗算する規格化処理を実行して、中間データＤ¹ _n+5を算出する。また、対象領域Ｃ７では、奇数番目の２点の中間データＤ¹ _n+4，Ｄ¹ _n+5を加算したデータにリフティング係数−δを乗算することで乗算値を算出した後、この乗算値と偶数番目の中間データＳ¹ _n+5とを加算するという３点演算が実行される。この結果、偶数番目の入力データＸ（２ｎ＋１０）を始点とする系列上の第２段階の中間データＳ² _n+5が算出される。また、対象領域Ｃ８では、２点の中間データＤ² _n ₊₂，Ｄ² _n+3を加算したデータにリフティング係数−βを乗算して乗算値を算出した後、この乗算値と中間データＳ² _n+3とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋６）を始点とする系列上の出力データＸ（２ｎ＋６）算出される。
【０１３９】
以上のように、上記Ｎ回目処理（図１６）およびＮ＋１回目処理（図１７）と同様の処理が、全ての点の出力データが算出されるまで対象領域を移動させつつ繰り返し実行される。これにより、偶数番目或いは奇数番目の１点の出力データを算出するのに要する平均周期を１クロック周期とすることができ、出力データの算出周期を大幅に短縮化できる。
【０１４０】
次に、上記ウェーブレット変換装置３０を用いたラインベースの２次元逆ＤＷＴ処理を以下に説明する。
【０１４１】
水平フィルタリング部３３Ａに入力するサブバンド（帯域成分）は、図１１に示すように、サブバンド２３ＬＬおよび２３ＨＬ、あるいは、サブバンド２３ＬＨおよび２３ＨＨである。
【０１４２】
図１６〜図１９で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３ＬＬと２３ＨＬの水平方向のデータを交互に配列したデータ、あるいは、サブバンド２３ＬＨと２３ＨＨの水平方向のデータを交互に配列したデータである。そして、サブバンド２３ＬＬと２３ＨＬとからなる入力データに対して水平フィルタリングを施すことにより、サブバンド２３Ｌが出力され、サブバンド２３ＬＨと２３ＨＨとからなる入力データに対して水平フィルタリングを施すことによりサブバンド２３Ｈが出力される。図１６〜図１９で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、サブバンド２３Ｌあるいはサブバンド２３Ｈの水平方向の１ラインのデータ列を示している。
【０１４３】
次に、垂直フィルタリング部３３Ｂが入力するサブバンドは、図１１に示すように、サブバンド２３Ｌおよびサブバンド２３Ｈである。この場合には、図１６〜図１９で示した入力データ・・・，Ｙ（ｎ−１），Ｙ（ｎ），Ｙ（ｎ＋１），・・・は、サブバンド２３Ｌと２３Ｈの垂直方向のデータを交互に配列したデータである。そして、サブバンド２３Ｌと２３Ｈとからなる入力データに対して垂直フィルタリングを施すことにより、画像データ２３が出力される。図１６〜図１９で示した出力データ・・・，Ｘ（ｎ−１），Ｘ（ｎ），Ｘ（ｎ＋１），・・・は、画像データ２３の垂直方向の１ラインのデータ列を示している。画像データ２３は、水平画素数Ｗ、垂直画素数Ｈを有する矩形状のデータである。
【０１４４】
サブバンド２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨは、垂直画素数Ｈ／２、水平画素数Ｗ／２を有する矩形状のデータであって、図２０に模式的に示すように、偶数行偶数列のサブバンド２３ＬＬおよび偶数行奇数列のサブバンド２３ＨＬを１組として、あるいは、奇数行偶数列のサブバンド２３ＬＨおよび奇数行奇数列のサブバンド２３ＨＨを１組として、垂直方向に配列するデータ列…，Ｙ_i（２ｎ），Ｙ_i（２ｎ＋１），Ｙ_i（２ｎ＋２）…として水平フィルタリング部３３に入力させられる。つまり、記憶領域５８Ｌにおける各画素行（図の横方向のデータ列）は、サブバンド２３ＬＬおよび２３ＨＬの各水平ラインの画素を交互に配列したデータ列であり、記憶領域５８Ｈに入力される各画素行（図の横方向のデータ列）は、サブバンド２３ＬＨおよび２３ＨＨの各水平ラインの画素を交互に配列したデータ列である。入力データＹ_i（ｋ）の添字ｉは、当該入力データＹ_i（ｋ）が所属する画素列の番号を示すものとする。画素列の番号ｉは、ｉ＝０，１．…，Ｗ−１（Ｗ：水平画素数）の値をとる。図中、サブバンド２３ＬＬおよび２３ＨＬを１組とした偶数行の記憶領域５８Ｌと、サブバンド２３ＬＨおよびサブバンド２３ＨＨを１組とした奇数行の記憶領域５８Ｈとを２領域に分割しているが、メモリ状のデータ配置はこれに限定されるものではない。
【０１４５】
具体的には、第１リングメモリ３２Ａと水平フィルタリング部３３Ａは、上記Ｎ回目処理（図１６）〜上記Ｎ＋２回目処理（図１７）を含む各回の処理を、低域側（記憶領域５８Ｌ側）と高域側（記憶領域５８Ｈ側）を交互に切り替えながら、各回の処理を画素単位について繰り返し実行する。
【０１４６】
例えば、上記Ｎ回目処理（図１６）が、メモリ領域５８Ｌ側の１番目の画素行に対して１回実行された後に、上記Ｎ＋１回目処理（図１７）が１回実行され、更に、上記Ｎ＋２回目処理（図１８）が１回実行され、・・・といった処理が行われる。同様に、記憶領域５８Ｈ側の１番目の画素行に対して実行され、次に、記憶領域５８Ｌ側の２番目の画素行に対して実行された後に、記憶領域５８Ｈ側の２番目の画素行に対して実行され、次に、記憶領域５８Ｌ側の３番目の画素行に対して実行された後に、記憶領域５８Ｈ側の３番目の画素行に対して実行され、・・・、最終的に、記憶領域５８Ｌ側のＨ／２番目の画素行に対して実行された後に、記憶領域５８Ｈ側のＨ／２番目の画素行に対して実行される。
【０１４７】
なお、第１リングメモリ３２Ａは、図２１に模式的に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する９点（９画素）のデータを保持する記憶領域５９を有しており、上記一時データや中間データを保持することができる。
【０１４８】
この結果、水平フィルタリング部３３Ａからは、サブバンド２３ＬＬと２３ＨＬとが合成されたサブバンド２３Ｌの各水平ライン単位（Ｈ／２高さ）の出力と、サブバンド２３ＬＨと２３ＨＨとが合成されたサブバンド２３Ｈの各水平ライン単位（Ｈ／２高さ）の出力とが、交互にで連続的に出力される。
【０１４９】
そして、サブバンド２３Ｌの水平ラインとサブバンド２３Ｈの水平ラインとが、交互に配列されたデータが、垂直ラインのデータとして、そのまま第２リングメモリ３２Ｂに出力され垂直フィルタリング部３３Ｂで処理される。
【０１５０】
具体的には、第２リングメモリ３２Ｂと垂直フィルタリング部３３Ｂは、上記Ｎ回目処理（図１６）〜上記Ｎ＋１回目処理（図１７）を含む各画素列について処理を水平ライン単位で繰り返し実行する。例えば、上記Ｎ回目処理（図１６）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、次に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。次に、上記Ｎ＋１回目処理（図７）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、更に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。このようにして、各回の処理が全ての画素列について順次実行される。なお、第２リングメモリ３２Ｂは、図２０に模式的に示すように、入力データ列に対応する９×Ｗ点（９ライン）のデータを保持する記憶領域５８を有しており、上記一時データや中間データを保持することができる。
【０１５１】
この結果、垂直フィルタリング部３３Ｂは、水平ライン単位で入力するデータ行から画像データ２３を出力するのである。
【０１５２】
以上の処理を再帰的に実行させることで、任意次数の分解レベルの帯域成分を合成処理し、画像データを復元することができる。すなわち、ｋ＋１次（ｋは整数）の分解レベルにおけるサブバンドＬＬ（ｋ＋１），ＨＬ（ｋ＋１），ＬＨ（ｋ＋１），ＨＨ（ｋ＋１）を、ウェーブレット変換装置１に再帰的に入力させることで、ｋ次のサブバンドＬＬ（ｋ）を得ることが可能である。
【０１５３】
以上のように、本実施形態に係るウェーブレット変換装置１では、図１５に示す構成を有する水平フィルタリング部３３Ａと垂直フィルタリング部３３Ｂとを備えるため、出力データの算出周期を短縮化できる。したがって、ラインベースの２次元ウェーブレット変換を短時間で高速で行うことが可能である。
【０１５４】
そして、第２の実施形態においては、第１の実施の形態において必要であった、水平フィルタリング部３３Ａの出力を記憶するバッファが不要である。第１の実施形態においては、水平フィルタリング部４Ａが４クロックで１画素を出力し、垂直フィルタリング部４Ｂが４クロックで１画素を入力する構成であったが、水平フィルタリング４ＡがＮ＋６回目処理（図９）およびＮ＋７回目処理（図１０）において、連続的に、垂直ラインを出力するのに対して、垂直フィルタリング部４Ｂでは、Ｎ回目処理（図３）で垂直ラインを入力した後、Ｎ＋４回目処理（図７）までは、垂直ラインを入力しない。このためバッファが必要であった。これに対して、第２の実施形態においては、水平フィルタリング部３３Ａが各回処理において垂直ラインを出力し、垂直フィルタリング３３Ｂが各回処理において垂直ラインを入力するので、バッファが不要となるのである。
【０１５５】
＜第３の実施形態＞
次に、本発明の第３の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。本実施形態に係るウェーブレット変換装置は、水平フィルタリング部と垂直フィルタリング部を除いて、上記第２の実施形態に係るウェーブレット変換装置３０（図１４）の構成と同じ構成を有する。ただし、第２の実施形態においては第１，第２リングメモリ３２Ａ，３２Ｂは、それぞれ９点、９ラインのリングメモリであったが、この実施の形態においては、第１，第２リングメモリ３２Ａ，３２Ｂは、それぞれ８点、８ラインのリングメモリである。
【０１５６】
図２２は、第３の実施形態に係るフィルタリング部３３ｓの概略構成を示す図である。このフィルタリング部３３ｓは、水平フィルタリング部または垂直フィルタリング部を示し、また、リングメモリ３２ｓは、図１４に示した第１リングメモリ３２Ａまたは第２リングメモリ３２Ｂの何れかを示すものとする。
【０１５７】
このフィルタリング部３３ｓは、リングメモリ３２ｓから入力データを選択的に取り込む第１，第２データ・セレクタ６０，６５、遅延レジスタ６４、第１〜第５係数乗算器６１，６６，７１，７６，８１、加算器７０，７５，８０，８５、出力先選択部（ＤＭＵＸ）８６、および制御部８７を備えて構成される。これら構成要素のうち、第２係数乗算器６６と加算器７０の組は、２点のデータを上記ステップａ或いはステップｂ（図３８）の方法で処理する２点演算部を構成する。その他、第３係数乗算器７１と加算器７５の組、第４係数乗算器７６と加算器８０の組、および第５係数乗算器８１と加算器８５の組も同様に２点演算部を構成している。また、これら２点演算部と出力先選択部８６とで中間データ算出手段が構成される。
【０１５８】
制御部８７は、画素クロック信号ＰＣＬＫと同期して動作する。第１データ・セレクタ６０は、この制御部８７から供給される選択制御信号ＳＥＬ０の値に応じて、リングメモリ３２ｓから取り込んだデータを第１端子Ｓ０〜第８端子Ｓ７の何れかから選択的に出力する。
【０１５９】
第１データ・セレクタ６０の第１端子Ｓ０から出力されたデータは、第１係数乗算器６１に入力される。第１係数乗算器６１では、制御部８７から供給される制御信号Ｃ０の値に応じて、規格化係数κ，１／κの何れか一方を乗算器６３に出力し、乗算器６３は、入力データにその規格化係数を乗算する。乗算器６３からの出力データは、遅延レジスタ６４に入力される。この第１係数乗算器６１における規格化処理は１クロック周期内に実行される。なお、第１係数乗算器６１と遅延レジスタ６４とから規格化手段が構成される。遅延レジスタ６４の出力は第２データ・セレクタ６５に入力され、かつ、分岐してＭＭＵ３１に入力される。
【０１６０】
第２データ・セレクタ６５は、制御部８７から供給される選択制御信号ＳＥＬ１の値に応じて、遅延レジスタ６４および第１データ・セレクタ６０から取り込んだデータを第１端子Ｓ０〜第８端子Ｓ７の何れかから選択的に出力する。第２〜第５係数乗算器６６，７１，７６，８１は、それぞれ、制御信号Ｃ１〜Ｃ４に従って入力データにリフティング係数−α，−β，−γ，−δを乗算する回路である。係数レジスタ６７，７２，７７，８２は、制御信号Ｃ１〜Ｃ４を受けて、リフティング係数α，β，γ，δをそれぞれ乗算器６８，７３，７８，８３に出力する。乗算器６８，７３，７８，８３は、それぞれ、第２データ・セレクタ６５の出力端子Ｓ０，Ｓ２，Ｓ４，Ｓ６から入力するデータにリフティング係数α，β，γ，δを乗算して出力する。２の補数演算回路６９，７４，７９，８４は、それぞれ乗算器６８，７３，７８，８３からの出力データの符号を反転させる。加算器７０，７５，８０，８５は、それぞれ、第２〜第５係数乗算器６６，７１，７６，８１から入力したデータと、第２データ・セレクタ６５の出力端子Ｓ１，Ｓ３，Ｓ５，Ｓ７から入力したデータとを加算して出力先選択部８６に出力する。
【０１６１】
出力先選択部８６は、制御部８７から供給される選択制御信号ＳＥＬ２の値に応じて、加算器７０，７５，８０，８５から並列に入力する４点のデータを第１端子Ｋ０〜第５端子Ｋ４から出力する。第１端子Ｋ０および第２端子Ｋ１から出力されたデータは合成データとして外部に出力される。また、第２端子Ｋ１から分岐されたデータおよび第３端子Ｋ２〜第５端子Ｋ４から出力されたデータは、ＭＭＵ３１に入力される。ＭＭＵ３１は、これら第２端子Ｋ１〜第５端子Ｋ４からＭＭＵ３１へ出力されたデータをリングメモリ３２ｓに転送し記憶させることができる。
【０１６２】
次に、図２２に示すフィルタリング部３３ｓを用いたリフティング演算の代表例を、図２３〜図２５を参照しつつ以下に説明する。この格子図の演算は、図３７の場合と同様に行われる。なお、図２３〜図２５では、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【０１６３】
図２３は、Ｎ回目処理（Ｎ：整数）が終了した時点の格子図を示し、図２４、図２５は、それぞれＮ＋１回目、Ｎ＋２回目の処理を模式的に示している。Ｎ回目処理（図２３）では、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２の４個の変換処理が１クロック周期内に並列に同時実行される。対象領域Ａ１では、２点の中間データD¹ _n+2，Ｓ² _n+2を用いた上記ステップａ（図３８）の２点演算を実行して、奇数番目の入力データＹ（２ｎ＋５）を始点とする系列上の第２段階の一時データ（Ｄ² _n+2）を算出する。ここで、中間データＳ² _n+2は、中間データＤ¹ _n+2の系列に対して１点前の系列上のデータである。また、対象領域Ａ２では、２点のデータＤ² _n，Ｘ（２ｎ）を用いた上記ステップａの２点演算を実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋１））を算出する。また、対象領域Ｂ１では、一時データ（Ｓ² _n+3）と１クロック周期前の演算処理で算出された中間データＤ¹ _n+3とを用いた上記ステップｂ（図３８）の２点演算を実行して、偶数番目の入力データＹ（２ｎ＋６）を始点とする系列上の第２段階の中間データＳ² _n+3を算出する。ここで、中間データＤ¹ _n+3は、一時データ（Ｓ² _n+3）の系列に対して１点後の系列上のデータである。また、対象領域Ｂ２では、出力一時データ（Ｘ（２ｎ＋２））と中間データＤ² _n+1とを用いた上記ステップｂの２点演算を実行して、偶数番目の入力データＹ（２ｎ＋２）を始点とする系列上の出力データＸ（２ｎ＋２）を算出する。
【０１６４】
また、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２における上記並列処理の１クロック前の周期において、対象領域Ｎ１の規格化処理が行なわれる。対象領域Ｎ１においては、入力データＹ（２ｎ＋７）に規格化係数１／κを乗算する規格化処理が実行される。
【０１６５】
このＮ回目の具体的な処理の内容は次の通りである。リングメモリ３２ｓは８ライン（系列）の記憶領域を備えている。
Ｎ回目処理においては、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２内の演算処理が１クロック周期内に行なわれるが、この演算処理の１クロック周期前において、対象領域Ｎ１内の演算処理が行なわれる。この１クロック前の周期における処理から説明する。ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された入力データＹ（２ｎ＋７）を第１データ・セレクタ６０に出力する。第１データ・セレクタ６０は、制御部８７からの選択制御信号ＳＥＬ０の値に応じて、入力データＹ（２ｎ＋７）を第１端子Ｓ０から出力する。
【０１６６】
第１端子Ｓ０から出力された入力データＹ（２ｎ＋７）は、第１係数乗算器６１に入力される。第１係数乗算器６１において、係数レジスタ６２は、制御部８７から供給された制御信号Ｃ０に従って、２個の規格化係数κ，１／κのうち規格化係数１／κを乗算器６３に出力し、乗算器６３は入力データＹ（２ｎ＋７）に規格化係数１／κを乗算する。この結果、第１係数乗算器６１は、データＤ¹ _n+3（＝（１／κ）×Ｙ（２ｎ＋７））を算出する。乗算器６３の出力は、遅延レジスタ６４に入力される。以上の処理が、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２内の演算処理が行なわれる１クロック前の周期において実行される。
【０１６７】
次のクロック周期において、ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された７点のデータＸ（２ｎ），Ｄ² _n，（Ｘ（２ｎ＋２）），Ｄ² _n+1，Ｓ² _n+2，Ｄ¹ _n+2，（Ｓ² _n+3）を第１データ・セレクタ６０に出力させる。第１データ・セレクタ６０は、制御部８７から供給される選択制御信号ＳＥＬ０の値に応じて、前記７点のデータを第２データ・セレクタ６５に出力する。また、遅延レジスタ６４に記憶されているデータＤ¹ _n+3が第２データ・セレクタ６５に出力される。遅延レジスタ６４から出力されたは中間データＤ¹ _n+3は分岐して外部のＭＭＵ３１にも出力され、ＭＭＵ３１は、その中間データＤ¹ _n+3をリングメモリ３２ｓに転送し、参照済みの記憶領域入力データＹ（２ｎ＋７）に上書きさせる。
【０１６８】
第２データ・セレクタ６５は、制御部８７から供給される選択制御信号ＳＥＬ１の値に応じて、８点のデータのうち対象領域Ａ２内の２点の出力データＸ（２ｎ），Ｄ² _nを選択して第１端子Ｓ０と第２端子Ｓ１とに出力し、対象領域Ｂ２内の中間データＤ² _n+1と一時データ（Ｘ（２ｎ＋２））とを第３端子Ｓ２と第４端子Ｓ３とから出力し、対象領域Ａ１内の中間データＳ² _n+2とＤ¹ _n+2とを第５端子Ｓ４と第６端子Ｓ５とから出力し、対象領域Ｂ１内の中間データＤ¹ _n+3と一時データ（Ｓ² _n+3）とを第７端子Ｓ６と第８端子Ｓ７とから出力する。
【０１６９】
第２係数乗算器６６において、係数レジスタ６７は、制御部８７から供給された制御信号Ｃ１に応じてリフティング係数αを乗算器６８に出力し、乗算器６８は、第１端子Ｓ０から入力したデータＸ（２ｎ）にリフティング係数αを乗算して得たデータα×Ｘ（２ｎ）を出力する。乗算器６８からの出力データは、２の補数演算回路６９において符号が反転され、加算器７０に出力される。加算器７０は、第２係数乗算器６６から出力されたたデータ−α×Ｘ（２ｎ）と、第２データ・セレクタ６５の第２端子Ｓ１から入力したデータＤ² _nとを加算することで対象領域Ａ２内の一時データ（Ｘ（２ｎ＋１））を算出し、出力先選択部８６に出力する。
【０１７０】
また、第３係数乗算器７１では、係数レジスタ７２は、制御部８７から供給された制御信号Ｃ２に応じてリフティング係数βを乗算器７３に出力し、乗算器７３は、第３端子Ｓ２から入力した中間データＤ² _n+1にリフティング係数βを乗算して得たデータβ×Ｄ² _n+1を出力する。乗算器７３の出力は、２の補数演算回路７４において符号が反転された後、加算器７５に出力される。加算器７５は、第３係数乗算器７１から出力されたデータ−β×Ｄ² _n+1と、第２データ・セレクタ６５の第４端子Ｓ３から入力した出力一時データ（Ｘ（２ｎ＋２））とを加算することで、対象領域Ｂ２内の出力データＸ（２ｎ＋２）を算出し、出力先選択部８６に出力する。
【０１７１】
また、第４係数乗算器７６では、係数レジスタ７７は、制御部８７から供給された制御信号Ｃ３に応じてリフティング係数γを乗算器７８に出力し、乗算器７８は、第５端子Ｓ４から入力した中間データＳ² _n+2にリフティング係数γを乗算して得たデータγ×Ｓ² _n+2を出力する。乗算器７８の出力は、２の補数演算回路７９において符号が反転された後、加算器８０に出力される。加算器８０は、第４係数乗算器７６から出力されたデータ−γ×Ｓ² _n+2と、第２データ・セレクタ６５の第６端子Ｓ５から入力したデータＤ¹ _n+2とを加算することで、対象領域Ａ１内の一時データ（Ｄ² _n+2）を算出し、出力先選択部８６に出力する。
【０１７２】
また、第５係数乗算器８１では、係数レジスタ８２は、制御部８７から供給された制御信号Ｃ４に応じてリフティング係数δを乗算器８３に出力し、乗算器８３は、第７端子Ｓ６から入力した中間データＤ¹ _n+3にリフティング係数δを乗算して得たデータδ×Ｄ¹ _n+3を出力する。乗算器８３の出力は、２の補数演算回路８４において符号が反転された後、加算器８５に出力される。加算器８５は、第５係数乗算器８１から出力されたデータ−δ×Ｄ¹ _n+3と、第２データ・セレクタ６５の第８端子Ｓ７から入力した一時データ（Ｓ² _n+3）とを加算することで、対象領域Ｂ１内の第２段階の中間データＳ² _n+3を算出し、出力先選択部８６に出力する。
【０１７３】
出力先選択部８６は、制御部８７から供給された選択制御信号ＳＥＬ２の値に従って、加算器７５から入力した出力データＸ（２ｎ＋２）を第２端子Ｋ１から外部に出力する。また、出力データＸ（２ｎ＋２）は、ＭＭＵ３１にも出力される。また、出力先選択部８６は、前記選択制御信号ＳＥＬ２に従って、加算器７０，８０，８５から入力した３点のデータ（Ｘ（２ｎ＋１）），（Ｄ² _n+2），Ｓ² _n+3を第３端子Ｋ２〜第５端子Ｋ４からＭＭＵ３１に出力する。ＭＭＵ３１は、フィルタリング部３３ｓから外部に出力された４点のデータ（Ｘ（２ｎ＋１）），Ｘ（２ｎ＋２），（Ｄ² _n+2），Ｓ² _n+3をリングメモリ３２ｓに転送し、ＭＭＵ３１は、その４点のデータ（Ｘ（２ｎ＋１）），Ｘ（２ｎ＋２），（Ｄ² _n+2），Ｓ² _n+3をリングメモリ３２ｓに転送し、参照済みの記憶領域Ｄ² _n，（Ｘ（２ｎ＋２）），Ｄ¹ _n+2，（Ｓ² _n+3）に上書きさせる。
【０１７４】
次に、Ｎ＋１回目処理（図２４）における対象領域Ａ３，Ａ４，Ｂ３，Ｂ４における変換処理が並列に同時実行される。対象領域Ａ３では、１クロック周期前の演算処理で算出された中間データＳ¹ _n+4と中間データＤ¹ _n+3を用いた上記ステップａ（図３８）の２点演算を実行して、偶数番目の入力データＹ（２ｎ＋８）を始点とする系列上の第２段階の一時データ（Ｓ² _n+4）を算出する。ここで、中間データＤ¹ _n+3は、中間データＳ¹ _n+4の系列に対して１点前の系列上のデータである。また、対象領域Ａ４では、２点のデータＳ² _n+2，Ｄ² _n+1を用いた上記ステップａの２点演算を実行して、偶数番目の入力データＹ（２ｎ＋４）を始点とする系列上の出力一時データ（Ｘ（２ｎ＋４））を算出する。また、対象領域Ｂ３では、一時データ（Ｄ² _n+2）と中間データＳ² _n+3とを用いた上記ステップｂ（図３８）の２点演算を実行して、奇数番目の入力データＹ（２ｎ＋５）を始点とする系列上の第２段階の中間データＤ² _n+2を算出する。ここで、中間データＳ² _n+3は、一時データ（Ｄ² _n+2）の系列に対して１点後の系列上のデータである。また、対象領域Ｂ４では、出力一時データ（Ｘ（２ｎ＋１））と出力データＸ（２ｎ＋２）とを用いた上記ステップｂの２点演算を実行して、奇数番目の入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）を算出する。
【０１７５】
また、対象領域Ａ３，Ａ４，Ｂ３，Ｂ４における上記並列処理の１クロック前の周期において、対象領域Ｎ２の規格化処理が行なわれる。対象領域Ｎ２では、入力データＹ（２ｎ＋８）に規格化係数κを乗算する規格化処理が実行される。
【０１７６】
次に、Ｎ＋１回目の具体的な処理の内容は次の通りである。１クロック前の周期の対象領域Ｎ２における処理から説明する。ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された入力データＹ（２ｎ＋８）を第１データ・セレクタ６０に出力する。第１データ・セレクタ６０は、制御部８７からの選択制御信号ＳＥＬ０の値に応じて、入力データＹ（２ｎ＋８）を第１端子Ｓ０から出力する。
【０１７７】
第１端子Ｓ０から出力された入力データＹ（２ｎ＋８）は、第１係数乗算器６１に入力される。第１係数乗算器６１において、係数レジスタ６２は、制御部８７から供給された制御信号Ｃ０に従って、２個の規格化係数κ，１／κのうち規格化係数κを乗算器６３に出力し、乗算器６３は入力データＹ（２ｎ＋８）に規格化係数κを乗算する。この結果、第１係数乗算器６１は、データＳ¹ _n+4（＝κ×Ｙ（２ｎ＋８））を算出する。乗算器６３の出力は、遅延レジスタ６４に入力される。以上の処理が、対象領域Ａ１，Ａ２，Ｂ１，Ｂ２内の演算処理が行なわれる１クロック前の周期において実行される。
【０１７８】
次のクロック周期において、ＭＭＵ３１は、リングメモリ３２ｓに一時記憶された７点のデータ（Ｘ（２ｎ＋１）），Ｘ（２ｎ＋２），Ｄ² _n+1，Ｓ² _n+2，（Ｄ² _n+2），Ｓ² _n+3，Ｄ¹ _n+3を第１データ・セレクタ６０に出力させる。第１データ・セレクタ６０は、制御部８７から供給される選択制御信号ＳＥＬ０の値に応じて、前記７点のデータを第２データ・セレクタ６５に出力する。また、遅延レジスタ６４に記憶されている中間データＳ¹ _n+4が第２データ・セレクタ６５に出力される。遅延レジスタ６４から出力されたは中間データＳ¹ _n+4は分岐して外部のＭＭＵ３１にも出力され、ＭＭＵ３１は、その中間データＳ¹ _n+4をリングメモリ３２ｓに転送し、参照済みの記憶領域入力データＹ（２ｎ＋８）に上書きさせる。
【０１７９】
第２データ・セレクタ６５は、制御部８７から供給される選択制御信号ＳＥＬ１の値に応じて、８点のデータのうち対象領域Ｂ４内の２点の入力データＸ（２ｎ＋２），（Ｘ（２ｎ＋１））を選択して第１端子Ｓ０と第２端子Ｓ１とに出力し、対象領域Ａ４内の中間データＤ² _n+1，Ｓ² _n+2とを第３端子Ｓ２と第４端子Ｓ３とから出力し、対象領域Ｂ３内の中間データＳ² _n+3と一時データ（Ｄ² _n+2）とを第５端子Ｓ４と第６端子Ｓ５とから出力し、対象領域Ａ３内の中間データＤ¹ _n+3とＳ¹ _n+4とを第７端子Ｓ６と第８端子Ｓ７とから出力する。
【０１８０】
第２係数乗算器６６において、係数レジスタ６７は、制御部８７から供給された制御信号Ｃ１に応じてリフティング係数αを乗算器６６に出力し、乗算器６８は、第１端子Ｓ０から入力したデータＸ（２ｎ＋２）にリフティング係数αを乗算して得たデータα×Ｘ（２ｎ＋２）を出力する。乗算器６８からの出力データは、２の補数演算回路６９において符号が反転され、加算器７０に出力される。加算器７０は、第２係数乗算器６６から出力されたたデータ−α×Ｘ（２ｎ＋２）と、第２データ・セレクタ６５の第２端子Ｓ１から入力した一時データ（Ｘ（２ｎ＋１））とを加算することで対象領域Ｂ４内の出力データＸ（２ｎ＋１）を算出し、出力先選択部８６に出力する。
【０１８１】
また、第３係数乗算器７１では、係数レジスタ７２は、制御部８７から供給された制御信号Ｃ２に応じてリフティング係数βを乗算器７３に出力し、乗算器７３は、第３端子Ｓ２から入力した中間データＤ² _n+1にリフティング係数βを乗算して得たデータβ×Ｄ² _n+1を出力する。乗算器７３の出力は、２の補数演算回路７４において符号が反転された後、加算器７５に出力される。加算器７５は、第３係数乗算器７１から出力されたデータ−β×Ｄ² _n+1と、第２データ・セレクタ６５の第４端子Ｓ３から入力した中間データＳ² _n+2とを加算することで、対象領域Ａ４内の出力一時データ（Ｘ（２ｎ＋４））を算出し、出力先選択部８６に出力する。
【０１８２】
また、第４係数乗算器７６では、係数レジスタ７７は、制御部８７から供給された制御信号Ｃ３に応じてリフティング係数γを乗算器７８に出力し、乗算器７８は、第５端子Ｓ４から入力した中間データＳ² _n+3にリフティング係数γを乗算して得たデータγ×Ｓ² _n+3を出力する。乗算器７８の出力は、２の補数演算回路７９において符号が反転された後、加算器８０に出力される。加算器８０は、第４係数乗算器７６から出力されたデータ−γ×Ｓ² _n+3と、第２データ・セレクタ６５の第６端子Ｓ５から入力した一時データ（Ｄ² _n+2）とを加算することで、対象領域Ｂ３内の中間データＤ² _n+2を算出し、出力先選択部８６に出力する。
【０１８３】
また、第５係数乗算器８１では、係数レジスタ８２は、制御部８７から供給された制御信号Ｃ４に応じてリフティング係数δを乗算器８３に出力し、乗算器８３は、第７端子Ｓ６から入力した中間データＤ¹ _n+3にリフティング係数δを乗算して得たデータδ×Ｄ¹ _n+3を出力する。乗算器８３の出力は、２の補数演算回路８４において符号が反転された後、加算器８５に出力される。加算器８５は、第５係数乗算器８１から出力されたデータ−δ×Ｄ¹ _n+3と、第２データ・セレクタ６５の第８端子Ｓ７から入力した中間データＳ¹ _n+4とを加算することで、対象領域Ａ３内の第２段階の中間データＳ² _n+4を算出し、出力先選択部８６に出力する。
【０１８４】
出力先選択部８６は、制御部８７から供給された選択制御信号ＳＥＬ２の値に従って、加算器７０から入力した出力データＸ（２ｎ＋１）を第１端子Ｋ０から外部に出力する。また、出力先選択部８６は、前記選択制御信号ＳＥＬ２に従って、加算器７５，８０，８５から入力した３点のデータ（Ｘ（２ｎ＋４）），Ｄ² _n+2，（Ｓ² _n+4）を第３端子Ｋ２〜第５端子Ｋ４からＭＭＵ３１に出力する。ＭＭＵ３１は、フィルタリング部３３ｓから外部に出力された３点のデータ（Ｘ（２ｎ＋４）），Ｄ² _n+2，（Ｓ² _n+4）をリングメモリ３２ｓに転送し、ＭＭＵ３１は、その３点のデータ（Ｘ（２ｎ＋４）），Ｄ² _n+2，（Ｓ² _n+4）をリングメモリ３２ｓに転送し、参照済みの記憶領域Ｓ² _n+2，（Ｄ² _n+2），Ｓ¹ _n+4に上書きさせる。
【０１８５】
次に、Ｎ＋２回目処理（図２５）における対象領域Ａ５，Ａ６，Ｂ５，Ｂ６の４個の変換処理が１クロック周期内に並列に同時実行される。また、対象領域Ａ５，Ａ６，Ｂ５，Ｂ６における上記並列処理の１クロック前の周期において、対象領域Ｎ３の規格化処理が行なわれる。
【０１８６】
対象領域Ａ６，Ｂ６，Ａ５，Ｂ５，Ｎ３は、それぞれ、上記Ｎ回目処理（図２３）の対象領域Ａ２，Ｂ２，Ａ１，Ｂ１，Ｎ１を２系列（２点）後方に移動した領域である。これら対象領域Ａ６，Ｂ６，Ａ５，Ｂ５，Ｎ３では、それぞれ、対象領域Ａ２，Ｂ２，Ａ１，Ｂ１，Ｎ１における処理と同様の処理が実行される。この結果として、対象領域Ａ６では一時データ（Ｘ（２ｎ＋３））が、対象領域Ｂ６では出力データＸ（２ｎ＋４）が、対象領域Ａ５では一時データ（Ｄ² _n+3）が、対象領域Ｂ５では中間データＳ² _n+4が、対象領域Ｎ３では中間データＤ¹ _n+4がそれぞれ算出される。
【０１８７】
次に、Ｎ＋３回目処理（図示せず）においては、上記Ｎ＋１回目処理（図２４）の対象領域Ｂ４，Ａ４，Ｂ３，Ａ３，Ｎ２を２系列（２点）後方に移動した領域において、Ｎ＋１回目処理と同様の処理が行なわれる。
【０１８８】
以上のように、上記Ｎ回目処理（図２３）および上記Ｎ＋１回目処理（図２４）と同様の処理が、全ての出力データが算出されるまで対象領域を移動させつつ繰り返し実行される。これにより、偶数番目或いは奇数番目の１点の出力データを算出するのに要する平均周期を１クロック周期とすることができ、出力データの算出周期を大幅に短縮化できる。
【０１８９】
本実施形態に係るウェーブレット変換装置は、図２２に示す構成を有する水平フィルタリング部と垂直フィルタリング部とを備えるため、上記第２の実施形態の場合と同じラインベースの２次元逆ＤＷＴ処理を実行することが可能である。したがって、ウェーブレット変換を極めて短時間で高速に行うことが可能である。
【０１９０】
また、第３の実施形態においても、第２の実施形態で説明したように水平フィルタリング部３３ｓが各回処理において水平ラインを出力し、垂直フィルタリング３３ｓが各回処理において画素列を入力するので、上記第１の実施形態に係るウェーブレット変換装置１のようにラインバッファ回路５を必要としない。したがって、小回路規模で、低消費電力で動作する廉価なウェーブレット変換装置の実現が可能である。
【０１９１】
＜変形例＞
図２６は、上記した第２および第３の実施形態の変形例に係る２次元ウェーブレット変換装置３０ａの概略構成を示す図である。このウェーブレット変換装置３０ａは、サブバンドの２次元画像データを一時的に保持するバッファ８８、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）８９、第１リングメモリ３２または３２ｓ、水平フィルタリング部３３または３３ｓ、第２リングメモリ３、垂直フィルタリング部４を備えて構成されている。
【０１９２】
ここで、第２リングメモリ３と垂直フィルタリング部４は、上記第１の実施形態に係るリングメモリ３とフィルタリング部４と同じ構成を有する。よって、本変形例の第２リングメモリ３Ｂと垂直フィルタリング部４Ｂは４ライン周期で１ラインの出力データを算出できる。
【０１９３】
また、第１リングメモリ３２または３２ｓと水平フィルタリング部３３または３３ｓとは、上記第２の実施形態に係るリングメモリ３２とフィルタリング部３３と、若しくは上記第３の実施形態に係るリングメモリ３２ｓとフィルタリング部３３ｓと同じ構成を有する。よって、本変形例の第１リングメモリ３２または３２ｓと水平フィルタリング部３３または３３ｓは１クロック周期で１点の出力データを算出できる。
【０１９４】
したがって、この変形例においては、水平フィルタリング部３３または３３ｓは、第１リングメモリ３２から４クロック周期間隔で入力データを取り込むように処理する。これにより、上記第１の実施形態に係るウェーブレット変換装置１（図１）のようにラインバッファ回路５を必要としない。したがって、メモリ使用量が少ない、小回路規模で低廉なウェーブレット変換装置の実現が可能となる。
【０１９５】
なお、本変形例では、第２リングメモリと垂直フィルタリング部として第１の実施形態に係る第２リングメモリ３Ｂと垂直フィルタリング部４Ｂを採用したが、この代わりに、第２リングメモリと垂直フィルタリング部として従来技術で説明したような平均５クロック周期で１点の出力データを算出する構成を採用してもよい。この場合には、水平フィルタリング部３３または３３ｓは、第１リングメモリ３２から５ライン周期間隔で入力データを取り込むように処理する。これにより、ラインバッファ回路５を必要としない構成とすることができる。
【０１９６】
＜第４の実施形態＞
次に、本発明の第４の実施形態に係るウェーブレット変換装置およびウェーブレット変換方法について説明する。図２７は、第４の実施形態に係るウェーブレット変換装置９０の概略構成を示す図である。このウェーブレット変換装置９０は、サブバンドの２次元画像データを一時的に保持するバッファ９１、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）９２、第１リングメモリ３２Ｈ、第１水平フィルタリング部３３Ｈ、第２リングメモリ３２Ｌ、第２水平フィルタリング部３３Ｌ、第３リングメモリ９３および垂直フィルタリング部９４を備えて構成されている。ここで、第１リングメモリ３２Ｈ、第１水平フィルタリング部３３Ｈ、第２リングメモリ３２Ｌ、第２水平フィルタリング部３３Ｌ、第３リングメモリ９３および垂直フィルタリング部９４は、外部供給の画素クロック信号ＰＣＬＫと同期して動作する。
【０１９７】
本実施形態では、ＭＭＵ９２、第１水平フィルタリング部３３Ｈ、第２水平フィルタリング部３３Ｌおよび垂直フィルタリング部９４、はハードウェアで構成されるが、この代わりに、マイクロプロセッサで実行する命令群を含むコンピュータ・プログラムで構成されてもよい。
【０１９８】
このウェーブレット変換装置９０は、２次元画像データにラインベースの２次元逆ＤＷＴを１回施す機能を有している。第１および第２水平フィルタリング部３３Ｈ，３３Ｌと垂直フィルタリング部９４とは、それぞれ第３リングメモリ９３を介して接続されている。
【０１９９】
ＭＭＵ９２は、バッファ９１、第１リングメモリ３２Ｈ、第２リングメモリ３２Ｌおよび第３リングメモリ９３のデータ入出力を制御する機能を有しており、バッファ９１から読出したサブバンドの２次元画像データを第１リングメモリ３２Ｈおよび第２リングメモリ３２Ｌに転送し記憶させることができる。
【０２００】
ここで、バッファ９１には、図１１で示した４つのサブバンドのデータ２３ＬＬ，２３ＨＬ，２３ＬＨ，２３ＨＨが入力され、第１リングメモリ３２Ｈには、サブバンド２３ＬＨと２３ＨＨの水平方向の画素が交互に配列された水平幅Ｗ、垂直高さＨ／２の画像データが入力され、第２リングメモリ３２Ｌには、サブバンド２３ＬＬと２３ＨＬの水平方向の画素が交互に配列された水平幅Ｗ、垂直高さＨ／２の画像データが入力される。
【０２０１】
第１水平フィルタリング部３３Ｈは、第１リングメモリ３２Ｈから入力したデータに対して２次元画像の水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、サブバンド２３ＬＨと２３ＨＨとを合成した画像データであるサブバンド２３Ｈのデータを１点ずつ算出できる。このようにして算出されたサブバンド２３Ｈの画像データＹ_H（ｍ）が第３リングメモリ９３に転送される。
【０２０２】
第２水平フィルタリング部３３Ｌは、第２リングメモリ３２Ｌから入力したデータに対して２次元画像の水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、サブバンド２３ＬＬと２３ＨＬとを合成した画像データであるサブバンド２３Ｌのデータを１点ずつ算出できる。このようにして算出されたサブバンド２３Ｌの画像データＹ_L（ｍ）が第３リングメモリ９３に転送される。
【０２０３】
これら第１水平フィルタリング部３３Ｈと第２水平フィルタリング部３３Ｌとしては、上記第２または第３の実施形態に係るフィルタリング部３３または３３ｓと同じ構成を採用すればよい。
【０２０４】
一方、垂直フィルタリング部９４は、第３リングメモリ９３からサブバンド２３Ｌと２３Ｈの画像データＹ_L（ｍ）およびＹ_H（ｍ）を入力し、この画像データＹ_L（ｍ）およびＹ_H（ｍ）の垂直方向のラインを交互に配列したデータに対して画素列ごとに水平方向にフィルタリングを実行することで、画素クロック信号ＰＣＬＫの１クロック周期で、画像データ２３の垂直ラインのデータを水平方向に２点ずつ算出できる。
【０２０５】
図２８に、本実施形態に係る垂直フィルタリング部９４の概略構成を示す。この垂直フィルタリング部９４は、入力データを選択的に取り込む第１データ・セレクタ９５、第１および第２係数乗算器９６，１００、遅延レジスタ９９，１０３、第２データ・セレクタ１０４、前段の４つの加算器１０５，１１１，１１７，１２３、第３〜第６係数乗算器１０６，１１２，１１８，１２４、後段の４つの加算器１１０，１１６，１２２，１２８、出力先選択部（ＤＭＵＸ）１２９、および制御部１３０を備えて構成される。これら構成要素のうち、２個の加算器１０５，１１０と第３係数乗算器１０６からなる組は３点のデータを１クロック周期内に処理するため、３点演算部を構成する。また、２個の加算器１１１，１１６と第４係数乗算器１１２からなる組、２個の加算器１１７，１２２と第５係数乗算器１１８からなる組、および２個の加算器１２３，１２８と第６係数乗算器１２４からなる組も、それぞれ、３点のデータを１クロック周期内に処理するため、３点演算部を構成する。また、これら４組の３点演算部と出力先選択部１２９とで中間データ算出手段が構成される。
【０２０６】
制御部１３０は、画素クロック信号ＰＣＬＫと同期して動作する。第１データ・セレクタ９５は、この制御部１３０から供給される選択制御信号ＳＥＬ０の値に応じて、第３リングメモリ９３から取り込んだデータ（Ｙ_L（ｍ）およびＹ_H（ｍ）の垂直方向のラインを交互に配列したデータ）を第１端子Ｓ０〜第１２端子Ｓ１１の何れかから選択的に出力する。
【０２０７】
第１データ・セレクタ９５の第１端子Ｓ０あるいは第２端子Ｓ１から出力されたデータは、第１係数乗算器９６と第２係数乗算器１００とに入力される。第１係数乗算器９６では、係数レジスタ９７は、制御部１３０から供給される制御信号Ｃ０に応じて、規格化係数κを乗算器９８に出力し、乗算器９８は、入力データに規格化係数κを乗算し、乗算出力を遅延レジスタ９９に出力する。また、第２係数乗算器１００では、係数レジスタ１０１は、制御部１３０から供給される制御信号Ｃ１に応じて、規格化係数１／κを乗算器１０２に出力し、乗算器１０２は、入力データに規格化係数１／κを乗算し、乗算出力を遅延レジスタ１０３に出力する。なお、第１係数乗算器９６と遅延レジスタ９９との組、第２係数乗算器１０１と遅延レジスタ１０３との組で、それぞれ、本発明の規格化手段が構成される。
【０２０８】
遅延レジスタ９９と遅延レジスタ１０３とに入力されたデータは、画素クロック信号ＰＣＬＫの１クロック周期遅延した後に、第２データ・セレクタ１０４に出力される。また、遅延レジスタ１０３に入力されたデータは、分岐してＭＭＵ９２に出力される。
【０２０９】
また、第１データ・セレクタ９５の第３端子Ｓ２〜第１２端子Ｓ１１から出力されたデータは、第２データ・セレクタ１０４に出力され、さらに、第２データ・セレクタ１０４は、制御部１３０から供給される選択制御信号ＳＥＬ１に応じて、各データを４組の３点演算部に出力し、これら３点演算部において並列処理が実行される。
【０２１０】
前段の加算器１０５は、第２データ・セレクタ１０４の第１端子Ｓ０と第２端子Ｓ１とから出力された２点のデータを加算して第３係数乗算器１０６に出力する。第３係数乗算器１０６では、係数レジスタ１０７は、制御部１３０から供給される制御信号Ｃ２に応じて、リフティング係数αを乗算器１０８に出力し、乗算器１０８は、加算器１０５から入力したデータにリフティング係数αを乗算する。その乗算出力は２の補数演算回路１０９において符号が反転されて後段の加算器１１０に出力される。そして、後段の加算器１１０は、第３係数乗算器１０６から入力したデータと、第２データ・セレクタ１０４の第３端子Ｓ２から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１１】
また、前段の加算器１１１は、第２データ・セレクタ１０４の第４端子Ｓ３と第５端子Ｓ４とから出力された２点のデータを加算して第４係数乗算器１１２に出力する。第４係数乗算器１１２では、係数レジスタ１１３は、制御部１３０から供給される制御信号Ｃ３に応じて、リフティング係数βを乗算器１１４に出力し、乗算器１１４は、加算器１１１から入力したデータにリフティング係数βを乗算する。その乗算出力は２の補数演算回路１１５において符号が反転されて後段の加算器１１６に出力される。後段の加算器１１６は、第４係数乗算器１１２から入力したデータと、第２データ・セレクタ１０４の第６端子Ｓ５から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１２】
また、前段の加算器１１７は、第２データ・セレクタ１０４の第７端子Ｓ６と第８端子Ｓ７とから出力された２点のデータを加算して第５係数乗算器１１８に出力する。第５係数乗算器１１８では、係数レジスタ１１９は、制御部１３０から供給される制御信号Ｃ４に応じて、リフティング係数γを乗算器１２０に出力し、乗算器１２０は、加算器１１７から入力したデータにリフティング係数γを乗算する。その乗算出力は２の補数演算回路１２１において符号が反転されて後段の加算器１２２に出力される。後段の加算器１２２は、第５係数乗算器１１８から入力したデータと、第２データ・セレクタ１０４の第９端子Ｓ８から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１３】
また、前段の加算器１２３は、第２データ・セレクタ１０４の第１０端子Ｓ９と第１１端子Ｓ１０とから出力された２点のデータを加算して第６係数乗算器１２４に出力する。第６係数乗算器１２４では、係数レジスタ１２５は、制御部１３０から供給される制御信号Ｃ５に応じて、リフティング係数δを乗算器１２６に出力し、乗算器１２６は、加算器１２３から入力したデータにリフティング係数δを乗算する。その乗算出力は２の補数演算回路１２７において符号が反転されて後段の加算器１２８に出力される。後段の加算器１２８は、第６係数乗算器１２４から入力したデータと、第２データ・セレクタ１０４の第１２端子Ｓ１１から入力したデータとを加算して出力先選択部１２９に出力する。
【０２１４】
出力先選択部１２９は、制御部１３０から供給される選択制御信号ＳＥＬ２の値に応じて、後段の加算器１１０，１１６，１２２，１２８から並列に入力する４点のデータを第１端子Ｋ０〜第４端子Ｋ３の何れかから選択的に出力する。
【０２１５】
出力先選択部１２９は、第１端子Ｋ０と第２端子Ｋ１から出力データＸ（２ｋ）およびＸ（２ｋ＋１）とを出力する。また、出力先選択部１２９の第１端子Ｋ０、第３端子Ｋ２、第４端子Ｋ３から出力されたデータはＭＭＵ９２にも出力される。ＭＭＵ９２は、第１端子Ｋ０、第３端子Ｋ２、第４端子Ｋ３から出力されたデータを第３リングメモリ９３に転送し、参照済みの記憶領域に上書きさせることができる。
【０２１６】
次に、以上の垂直フィルタリング部９４を用いたリフティング演算の代表例を、図２９〜図３１を参照しつつ以下に説明する。図２９〜図３１は、９×７タップのDaubechiesフィルタのリフティング構成を模式的に示す格子図である。この格子図の演算は、図３７の場合と同様に行われる。なお、図２９〜図３１は、説明の便宜上、各格子点間を結ぶ線分に対応するリフティング係数−α，−β，−γ，−δと規格化係数κ，１／κとを表示していない。
【０２１７】
図２９〜図３１は、本実施形態でのＮ回目（Ｎは整数）〜Ｎ＋２回目の処理を模式的に示している。
【０２１８】
Ｎ回目処理（図２９）では、対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４の４個の変換処理が１クロック周期内に並列に同時実行される。
【０２１９】
対象領域Ｃ１では、２点の中間データＤ¹ _n+4，Ｄ¹ _n+5を加算したデータにリフティング係数−δを乗算することで乗算値を算出した後、この乗算値と中間データＳ¹ _n+5とを加算するという３点演算が実行される。この結果、偶数番目の入力データＹ（２ｎ＋１０）を始点とする系列上の第２段階の中間データＳ² _n+5が算出される。ここで、２点の中間データＤ¹ _n+4，Ｄ¹ _n+5は、中間データＳ¹ _n+5の系列に対して１点前後する系列上のデータである。
【０２２０】
また、対象領域Ｃ２では、２点の中間データＳ² _n+3，Ｓ² _n+4を加算したデータにリフティング係数−γを乗算した後、この乗算値と中間データＤ¹ _n+3とを加算するという３点演算が実行される。この結果、奇数番目の入力データＹ（２ｎ＋７）を始点とする系列上の第２段階の中間データＤ² _n+3が算出される。ここで、２点の中間データＳ² _n+3，Ｓ² _n+4は、中間データＤ¹ _n+3の系列に対して１点前後する系列上のデータである。
【０２２１】
また、対象領域Ｃ３では、２点の中間データＤ² _n+1，Ｄ² _n+2を加算したデータにリフティング係数−βを乗算することで乗算値を算出した後、この乗算値と中間データＳ² _n+2とを加算するという３点演算が実行される。この結果、入力データＹ（２ｎ＋４）を始点とする系列上の出力データＸ（２ｎ＋４）が算出される。ここで、２点の中間データＤ² _n+1，Ｄ² _n+2は、中間データＳ² _n+2の系列に対して１点前後する系列上のデータである。
【０２２２】
また、対象領域Ｃ４では、偶数番目の２点の出力データＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータにリフティング係数−αを乗算することで乗算値を算出した後、この乗算値と中間データＤ² _nとを加算するという３点演算が実行される。この結果、入力データＹ（２ｎ＋１）を始点とする系列上の出力データＸ（２ｎ＋１）が算出される。ここで、偶数番目の２点の入力データＸ（２ｎ），Ｘ（２ｎ＋２）は、中間データＤ² _nに対して１点前後するデータである。
【０２２３】
また、前記対象領域Ｃ１〜Ｃ４における演算処理が実行される１クロック前の周期において、対象領域Ｎ１およびＮ２における演算処理が並列実行される。対象領域Ｎ１においては、入力データＹ（２ｎ＋１０）に規格化係数κを乗算する規格化処理が実行され中間データＳ¹ _n+5が算出され、対象領域Ｎ２においては、入力データＹ（２ｎ＋１１）に規格化係数１／κを乗算する規格化処理が実行され中間データＤ¹ _n+5が算出される。
【０２２４】
このＮ回目の具体的な処理の内容は次の通りである。Ｎ回目処理においては、対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４内の演算処理が１クロック周期内に行なわれるが、この演算処理の１クロック周期前において、対象領域Ｎ１およびＮ２内の演算処理が行なわれる。この１クロック前の周期における処理から説明する。ＭＭＵ９２は、リングメモリ９３に一時記憶された入力データＹ（２ｎ＋１０）およびＹ（２ｎ＋１１）を入力し、制御部１３０から供給される選択制御信号ＳＥＬ０に応じて第１端子Ｓ０から入力データＹ（２ｎ＋１０）を出力し、第２端子Ｓ１から入力データＹ（２ｎ＋１１）を出力する。
【０２２５】
第１端子Ｓ０から出力された入力データＹ（２ｎ＋１０）は、第１係数乗算器９６に入力される。第１係数乗算器９６において、係数レジスタ９７は、制御部１３０から供給された制御信号Ｃ０に従って規格化係数κを乗算器９８に出力し、乗算器９８は入力データＹ（２ｎ＋１０）に規格化係数κを乗算する。この結果、第１係数乗算器９６は、中間データＳ¹ _n+5（＝κ×Ｙ（２ｎ＋１０））を１クロック周期内に算出する。
【０２２６】
第２端子Ｓ１から出力された入力データＹ（２ｎ＋１１）は、第２係数乗算器１００に入力される。第２係数乗算器１００において、係数レジスタ１０１は、制御部１３０から供給された制御信号Ｃ１に従って規格化係数１／κを乗算器１０２に出力し、乗算器１０２は入力データＹ（２ｎ＋１１）に規格化係数１／κを乗算する。この結果、第２係数乗算器１００は、中間データＤ¹ _n+5（＝１／κ×Ｙ（２ｎ＋１１））を１クロック周期内に算出する。
【０２２７】
第１および第２係数乗算器９６，１００から出力された中間データＳ¹ _n+5，Ｄ¹ _n+5は、それぞれ遅延レジスタ９９，１０３に入力される。遅延レジスタ９９，１００において、中間データＳ¹ _n+5，Ｄ¹ _n+5は１クロック周期遅延された後、出力される。
【０２２８】
上記対象領域Ｎ１およびＮ２内の演算処理が行なわれた１クロック周期の後において、ＭＭＵ９２は、第３リングメモリ９３に一時記憶された１０点のデータＸ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｄ² _n+1，Ｓ² _n+2，Ｄ² _n+2，Ｓ² _n+3，Ｄ¹ _n+3，Ｓ² _n+4，Ｄ¹ _n+4を第１データ・セレクタ９５に出力させる。第１データ・セレクタ９５は、制御部１３０から供給された選択制御信号ＳＥＬ０の値に応じて、前記１０点のデータを第３端子Ｓ２〜第１２端子Ｓ１１から出力する。この出力データは、第２データ・セレクタ１０４に入力される。また、遅延レジスタ９６，１０３に記憶されている中間データＳ¹ _n+5，Ｄ¹ _n+5が第２データ・セレクタ１０４に入力される。遅延レジスタ１０３から出力された中間データＤ¹ _n+5は分岐して外部のＭＭＵ９２にも出力され、ＭＭＵ９２は、その中間データＤ¹ _n+5をリングメモリ９３に転送し、参照済みの記憶領域入力データＹ（２ｎ＋１１）に上書きさせる。
【０２２９】
第２データ・セレクタ１０４は、制御部１３０から供給された選択制御信号ＳＥＬ１に応じて、前記１２点のデータの中から、対象領域C４内の３点の入力データＸ（２ｎ），Ｘ（２ｎ＋２），Ｄ² _nを選択してそれぞれ第１端子Ｓ０〜第３端子Ｓ２から出力し、対象領域Ｃ３内の３点のデータＤ² _n+1，Ｄ² _n+2，Ｓ² _n+2を選択してそれぞれ第４端子Ｓ３〜第６端子Ｓ５から出力し、対象領域Ｃ２内の３点のデータＳ² _n+3，Ｓ² _n+4，Ｄ¹ _n+3を選択してそれぞれ第７端子Ｓ６〜第９端子Ｓ８から出力し、対象領域Ｃ１内の３点のデータＤ¹ _n+4，Ｄ¹ _n+5，Ｓ¹ _n+5を選択してそれぞれ第１０端子Ｓ９〜第１２端子Ｓ１１から出力する。
【０２３０】
前段の加算器１０５は、第２データ・セレクタ１０４の第１端子Ｓ０と第２端子Ｓ１から入力した対象領域Ｃ４内の２点のデータＸ（２ｎ），Ｘ（２ｎ＋２）を加算したデータを第３係数乗算器１０６に出力する。第３係数乗算器１０６では、係数レジスタ１０７は制御信号Ｃ２に従ってリフティング係数αを乗算器１０８に供給し、乗算器１０８は、入力データとリフティング係数αとを乗算した乗算値（＝α×（Ｘ（２ｎ）＋Ｘ（２ｎ＋２）））を出力する。この出力データは、２の補数演算回路１０９において符号が反転された後、後段の加算器１１０に出力される。そして、後段の加算器１１０は、第３係数乗算器１０６から入力する乗算値と、第２データ・セレクタ１０４の第３端子Ｓ２から入力したデータＤ² _nとを加算することで、対象領域Ｃ４内の出力データＸ（２ｎ＋１）算出し、出力先選択部１２９に出力する。
【０２３１】
また、前段の加算器１１１は、第２データ・セレクタ１０４の第４端子Ｓ３と第５端子Ｓ４から入力した対象領域Ｃ３内の２点のデータＤ² _n+1，Ｄ² _n+2を加算したデータを第４係数乗算器１１２に出力する。第４係数乗算器１１２では、係数レジスタ１１３は制御信号Ｃ３に従ってリフティング係数βを乗算器１１４に供給し、乗算器１１４は、入力データとリフティング係数βとを乗算した乗算値（＝β×（Ｄ² _n+1＋Ｄ² _n+2））を出力する。この出力データは、２の補数演算回路１１５において符号が反転された後、後段の加算器１１６に出力される。そして、後段の加算器１１６は、第４係数乗算器１１２から入力する乗算値と、第２データ・セレクタ１０４の第６端子Ｓ５から入力したデータＳ² _n+2を加算することで、対象領域Ｃ３内の出力データＸ（２ｎ＋４）を算出し、出力先選択部１２９に出力する。
【０２３２】
また、前段の加算器１１７は、第２データ・セレクタ１０４の第７端子Ｓ６と第８端子Ｓ７から入力した対象領域Ｃ２内の２点のデータＳ² _n+3，Ｓ² _n+4を加算したデータを第５係数乗算器１１８に出力する。第５係数乗算器１１８では、係数レジスタ１１９は制御信号Ｃ４に従ってリフティング係数γを乗算器１２０に供給し、乗算器１２０は、入力データとリフティング係数γとを乗算した乗算値（＝γ×（Ｓ² _n+3＋Ｓ² _n+4））を出力する。この出力データは２の補数演算回路１２１において符号が反転された後、後段の加算器１２２に出力される。そして、後段の加算器１２２は、第５係数乗算器１１８から入力する乗算値と、第２データ・セレクタ１０４の第９端子Ｓ８から入力したデータＤ¹ _n+3とを加算することで、対象領域Ｃ２内の中間データＤ² _n+3を算出し、出力先選択部１２９に出力する。
【０２３３】
また、前段の加算器１２３は、第２データ・セレクタ１０４の第１０端子Ｓ９と第１１端子Ｓ１０から入力した対象領域Ｃ１内の２点のデータＤ¹ _n+4，Ｄ¹ _n+5を加算したデータを第６係数乗算器１２４に出力する。第６係数乗算器１２４では、係数レジスタ１２５は制御信号Ｃ５に従ってリフティング係数δを乗算器１２６に供給し、乗算器１２６は、入力データとリフティング係数δとを乗算した乗算値（＝δ×（Ｄ¹ _n+4＋Ｄ¹ _n+5））を出力する。この出力データは、２の補数演算回路１２７において符号が反転された後、後段の加算器１２８に出力される。そして、後段の加算器１２８は、第６係数乗算器１２４から入力する乗算値と、第２データ・セレクタ１０４の第１２端子Ｓ１１から入力した中間データＳ¹ _n ₊₅とを加算することで、対象領域Ｃ１内の中間データＳ² _n+5を算出し、出力先選択部１２９に出力する。
【０２３４】
出力先選択部１２９は、選択制御信号ＳＥＬ２の値に従って、後段の２つの加算器１１０,１１６から入力した２点の出力データを第１端子Ｋ０と第２端子Ｋ１とからそれぞれ出力する。また、出力先選択部１２９は、後段の３つの加算器１１０，１２２，１２８からの入力した３点のデータをＭＭＵ９２へ出力する。ＭＭＵ９２は、出力された中間データＸ（２ｎ＋４），Ｄ² _n+3，Ｓ² _n+5を第３リングメモリ９３に転送し、ＭＭＵ９２は、その３点のデータ（２ｎ＋４），Ｄ² _n+3，Ｓ² _n+5をリングメモリ９３に転送し、参照済みの記憶領域Ｓ² _n+2，Ｄ¹ _n+3，Ｙ（２ｎ＋１０）に上書きさせる。
【０２３５】
次のＮ＋１回目処理（図３０）では、対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８の変換処理が行なわれる。また、この対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８の変換処理より１クロック前の周期において対象領域Ｎ３，Ｎ４の２個の規格化処理とが実行される。対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８，Ｎ３，Ｎ４は、上記Ｎ回目処理（図２９）の対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４，Ｎ１，Ｎ２を２系列（２点）後方に移動した領域である。これら対象領域Ｃ５，Ｃ６，Ｃ７，Ｃ８，Ｎ３，Ｎ４では、それぞれ、対象領域Ｃ１，Ｃ２，Ｃ３，Ｃ４，Ｎ１，Ｎ２での処理と同様の処理が実行される。したがって、対象領域Ｃ８では、奇数番目の入力データＹ（２ｎ＋３）を始点とする系列上の出力データＸ（２ｎ＋３）が算出され、対象領域Ｃ７では、偶数番目の入力データＹ（２ｎ＋６）を始点とする系列上の出力データＸ（２ｎ＋６）が算出され、対象領域Ｃ６では、奇数番目の入力データＹ（２ｎ＋９）を始点とする系列上の第２段階の中間データＤ² _n+4が算出され、対象領域Ｃ１では、偶数番目の入力データＹ（２ｎ＋１２）を始点とする系列上の第２段階の中間データＳ² _n+6が算出される。また、１クロック前の周期において、対象領域Ｎ３，Ｎ４では、入力データＹ（２ｎ＋１２），Ｙ（２ｎ＋１３）に対する規格化処理が実行される。
【０２３６】
さらに、Ｎ＋２回目処理（図３１）では、対象領域Ｃ９，Ｃ１０，Ｃ１１，Ｃ１２の変換処理が行なわれる。また、この対象領域Ｃ９，Ｃ１０，Ｃ１１，Ｃ１２の変換処理より１クロック前の周期において対象領域Ｎ５，Ｎ６の２個の規格化処理とが実行される。
【０２３７】
以上のように、上記Ｎ回目処理（図２９）と同様の処理が、全ての点の出力データが算出されるまで対象領域を移動させつつ繰り返し実行される。これにより、偶数番目および奇数番目の２点の出力データを算出するのに要する平均周期を１クロック周期とすることができ、出力データの算出周期を大幅に短縮化できる。
【０２３８】
次に、上記ウェーブレット変換装置９０を用いたラインベースの２次元逆ＤＷＴ処理を以下に説明する。
【０２３９】
第１水平フィルタリング部３３Ｈに入力されるデータは、図１１に示したサブバンド２３ＬＨおよび２３ＨＨであり、第２水平フィルタリング部３３Ｌに入力されるデータは、サブバンド２３ＬＬおよび２３ＨＬである。そして、第１および第２水平フィルタリング部３３Ｈ，３３Ｌからは、それぞれサブバンド２３Ｈ（Ｙ_H（ｍ）），２３Ｌ（Ｙ_L（ｍ））が出力される。
【０２４０】
垂直フィルタリング部９４に入力するデータは、第１および第２水平フィルタリング部３３Ｈ，３３Ｌから出力されるデータＹ_H（ｍ），Ｙ_L（ｍ）であり、これらデータＹ_H（ｍ），Ｙ_L（ｍ）の垂直ラインのデータが交互に配列されることによって、水平方向に画素列として入力される。そして、垂直フィルタリング部９４は、２次元画像データ２３を出力する。
【０２４１】
具体的には、第１リングメモリ３２Ｈと第１水平フィルタリング部３３Ｈは、水平ライン単位で入力するデータを１点当たり１クロック周期でフィルタリングすることでサブバンド２３Ｈを出力し、また、第２リングメモリ３２Ｌと第２水平フィルタリング部３３Ｌは、水平ライン単位で入力するデータを１点当たり１クロック周期でフィルタリングすることでサブバンド２３Ｌを出力する。
【０２４２】
なお、第１リングメモリ３２Ｈと第２リングメモリ３３Ｌは、第３の実施例で述べた図２２のリングメモリ３２ｓを用いることができ、図３３に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する８点（８画素）のデータを保持する記憶領域１３３を有し、上記一時データや中間データを保持することができる。もしくは、第１リングメモリ３２Ｈと第２リングメモリ３３Ｌは、第２の実施例で述べた図１５のリングメモリ３２を用いることができ、図２１に示すように、入力データ…，Ｘ_j（ｋ），Ｘ_j+1（ｋ），…に対応する９点（９画素）のデータを保持する記憶領域５９を有し、上記一時データや中間データを保持することができる。
【０２４３】
同様に、第１および第２水平フィルタリング部３３Ｈ，３３Ｌは、第３の実施例で述べた図２２のフィルタリング部３３ｓ、もしくは、第２の実施例で述べた図１５のフィルタリング部３３を用いることができる。
【０２４４】
第３リングメモリ９３と垂直フィルタリング部９４は、上記Ｎ回目処理（図２９）と上記Ｎ＋１回目処理（図３０）を含む各回の処理を、各画素列について水平ライン単位で繰り返し実行する。例えば、上記Ｎ回目処理（図２９）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、次に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。その後、上記Ｎ＋１回目処理（図３０）が、０番目の画素列に対して実行された後に、１番目の画素列に対して実行され、更に、２番目の画素列に対して実行され、・・・、最終的に、Ｗ−１番目の画素列に対して実行される。このようにして、各回の処理が全ての画素列について繰り返し実行される。
【０２４５】
この結果、垂直フィルタリング部９４からは、偶数行のデータと奇数行のデータとが各水平ライン単位で並列に出力される。例えば、上記Ｎ回目処理（図２９）を０番目〜Ｗ−１番目の画素列に対して連続的に実行した結果、２ｎ＋１番目の水平ラインの奇数行のデータＸ₀（２ｎ＋１），Ｘ₁（２ｎ＋１），…，Ｘ_j（２ｎ＋１），…，Ｘ_W-1（２ｎ＋１）が連続的に出力される。これと並行して、２ｎ＋４番目の水平ラインの偶数行のデータＸ₀（２ｎ＋４），Ｘ₁（２ｎ＋４），…，Ｘ_j（２ｎ＋４），…，Ｘ_W-1（２ｎ＋４）が連続的に出力される。
【０２４６】
なお、第１リングメモリ９３は、図３２に模式的に示すように、入力データ列に対応する１２×Ｗ点（１２ライン）のデータを保持する記憶領域１３２を有しており、上記一時データや中間データを保持することができる。この記憶領域１３２は、垂直方向に１２点のデータを保持する列領域の集合体である。一つの列領域によって、１回の処理で参照される入力データや中間データが保持される。例えば、Ｎ回目処理（図２９）では、或る列領域において、データ列｛Ｘ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｄ² _n+1，Ｓ² _n+2，Ｄ² _n+2，Ｓ² _n+3，Ｄ¹ _n+3，Ｓ² _n+4，Ｄ¹ _n+4，Ｙ（２ｎ＋１０），Ｙ（２ｎ＋１１）｝から、データ列｛Ｘ（２ｎ），Ｄ² _n，Ｘ（２ｎ＋２），Ｄ² _n+1，Ｘ（２ｎ＋４），Ｄ² _n+2，Ｓ² _n+3，Ｄ² _n+3，Ｓ² _n+4，Ｄ¹ _n+4，Ｓ² _n+5，Ｄ¹ _n+5｝へ記憶内容が変化する（データＳ² _n+2，Ｄ¹ _n+3，Ｙ（２ｎ＋１０），Ｙ（２ｎ＋１１）が、それぞれ、データＸ（２ｎ＋４），Ｄ² _n+3，Ｓ² _n+5，Ｄ¹ _n+5に上書きされる）。
【０２４７】
以上の処理を再帰的に実行させることで、任意次数の分解レベルのサブバンド（帯域成分）を合成することができる。すなわち、ｋ＋１次（ｋは２以上の整数）の分解レベルにおける４つのサブバンドＬＬ（ｋ＋１），ＨＬ（ｋ＋１），ＬＨ（ｋ＋１），ＨＨ（ｋ＋１）を、ウェーブレット変換装置９０に入力させることで、ｋ次の分解レベルにおけるサブバンドＬＬ（ｋ）を得ることが可能であり、このような処理を再帰的に実行することによって、ｋ次の分解レベルのサブバンドから元の画像データを復元することが可能である。
【０２４８】
このように、本実施形態に係るウェーブレット変換装置９０とウェーブレット変換方法では、４点の中間データを算出する４個の変換処理と２点の中間データを規格化する２個の規格化処理とを１クロック周期内に並列に同時実行するため、出力データの算出周期を大幅に短縮化できる。したがって、ウェーブレット変換を極めて短時間で高速に実行することが可能である。
【０２４９】
また、ウェーブレット変換装置９０は、１クロック周期内に１点のデータを算出する第１および第２水平フィルタリング部３３Ｈ，３３Ｌと、１クロック周期内で２点のデータを算出する垂直フィルタリング部９４とを備えるため、１クロック周期内に２点の合成データを並列に算出できる。したがって、ラインベースの２次元ＤＷＴ演算を極めて高速に実行することが可能である。
【０２５０】
＜変形例＞
図３４は、上記した第４の実施形態の変形例に係る２次元ウェーブレット変換装置１４０の概略構成を示す図である。このウェーブレット変換装置１４０は、サブバンドの２次元画像データを一時的に保持するバッファ９１、外部供給のクロック信号ＣＬＫと同期して動作するＭＭＵ（メモリ管理部）９２Ａ、第１リングメモリ９３Ａ、水平フィルタリング部９４Ａ、ラインバッファ回路１４１、第２リングメモリ９３Ｂおよび垂直フィルタリング部９４Ｂを備えて構成されている。
【０２５１】
ここで、水平フィルタリング部９４Ａと垂直フィルタリング部９４Ｂは、上記第４の実施形態に係る垂直フィルタリング部９４（図２８）の構成と同じ構成を有し、図２９〜図３１で示したリフティング演算を実行するように、データを与えられ且つ制御される。
【０２５２】
水平フィルタリング部９４Ａからは、サブバンド２３Ｈと２３Ｌのデータが交互に各水平ライン単位で出力される。
【０２５３】
ラインバッファ回路１４１においては、第１ラインバッファ１４３と第２ラインバッファ１４４は、それぞれ、水平ライン２本分のバッファを備えている。セレクタ１４２が、入力する２本のデータを第１ラインバッファ１４３と第２ラインバッファ１４４の何れか一方に記憶させる期間、デマルチプレクサ１４５は、その他方に記憶済みの２本のデータを読み出して第２リングメモリ９３Ｂに出力する。
【０２５４】
このように本変形例の構成によっても、１クロック周期内に合成データを２点並列に算出できることから、ラインベースの２次元ＤＷＴ演算を極めて高速に実行することが可能である。
【０２５５】
【発明の効果】
以上の如く、本発明に係るウェーブレット変換装置によれば、各入力データを規格化する処理と、各中間データを一系列上の他の中間データや出力データに変換する変換処理とを繰り返し実行し、繰り返し実行される複数の処理のうち少なくとも２個の処理を１クロック周期内に並列に実行するため、出力データの算出周期を短縮化でき、逆ウェーブレット変換を短時間で高速に実行することが可能になる。
【０２５６】
また、本発明に係るウェーブレット変換方法によれば、入力データを規格化して第１段階の中間データに変換する工程（ｂ）と、中間データを一系列上の他の中間データに変換する工程（ｃ）と、最終段階の中間データを出力データに変換する工程（ｄ）とは繰り返し実行されるが、繰り返し実行する複数の工程のうち少なくとも２工程を１クロック周期内に並列に実行するため、入力データ列から出力データを算出する周期を短縮化でき、逆ウェーブレット変換を短時間で高速に行うことが可能になる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係るウェーブレット変換装置の概略構成を示す図である。
【図２】第１の実施形態に係るフィルタリング部の概略構成図である。
【図３】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図４】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図５】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図６】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図７】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図８】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図９】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１０】第１の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１１】サブバンドから画像を合成する工程を模式的に示す図である。
【図１２】２次元画像データとリングメモリの記憶領域とを模式的に示す図である。
【図１３】リングメモリの記憶領域を模式的に示す図である。
【図１４】本発明の第２の実施形態に係るウェーブレット変換装置の概略構成を示す図である。
【図１５】第２の実施形態に係るフィルタリング部の概略構成図である。
【図１６】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１７】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１８】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図１９】第２の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２０】２次元画像データとリングメモリの記憶領域とを模式的に示す図である。
【図２１】リングメモリの記憶領域を模式的に示す図である。
【図２２】本発明の第３の実施形態に係るフィルタリング部の概略構成を示す図である。
【図２３】第３の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２４】第３の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２５】第３の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図２６】第２および第３の実施形態の変形例に係るウェーブレット変換装置の概略構成を示す図である。
【図２７】本発明の第４の実施形態に係るウェーブレット変換装置の概略構成を示す図である。
【図２８】第４の実施形態に係る垂直フィルタリング部の概略構成図である。
【図２９】第４の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図３０】第４の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図３１】第４の実施形態に係るリフティング演算の工程を模式的に示す図である。
【図３２】２次元画像データとリングメモリの記憶領域とを模式的に示す図である。
【図３３】リングメモリの記憶領域を模式的に示す図である。
【図３４】第４の実施形態の変形例に係るウェーブレット変換装置の概略構成を示す図である。
【図３５】ＤＷＴと逆ＤＷＴで用いるフィルタバンクを模式的に示す図である。
【図３６】３次の分解レベルで２次元ＤＷＴを施された画像データを模式的に示す図である。
【図３７】合成側のリフティング構成を模式的に示す格子図である。
【図３８】ＪＰＥＧ２０００方式が推奨する算出方法を模式的に示す図である。
【図３９】リフティング演算の工程を模式的に示す図である。
【図４０】リフティング演算の工程を模式的に示す図である。
【図４１】リフティング演算の工程を模式的に示す図である。
【図４２】リフティング演算の工程を模式的に示す図である。
【図４３】リフティング演算の工程を模式的に示す図である。
【図４４】リフティング演算の工程を模式的に示す図である。
【図４５】リフティング演算の工程を模式的に示す図である。
【図４６】リフティング演算の工程を模式的に示す図である。
【図４７】リフティング演算の工程を模式的に示す図である。
【図４８】リフティング演算の工程を模式的に示す図である。
【符号の説明】
１ウェーブレット変換装置
２ＭＭＵ（メモリ管理部）
３Ａ，３Ｂリングメモリ
４Ａ，４Ｂフィルタリング部
５ラインバッファ回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a compression / decompression technique using wavelet transform.
[0002]
[Prior art]
As a high-efficiency encoding method for image data, an image compression / decompression method based on discrete wavelet transformation (hereinafter referred to as “DWT”) is known, which is established by ISO (International Organization for Standardization). Adopted in JPEG2000 (Joint Photographic Experts Group 2000) system. As a DWT calculation method, a convolution calculation method and a calculation method based on a lifting scheme are known, and both output the same result, but the latter calculation method based on the lifting configuration is more preferable. Compared to the convolution calculation method, there are advantages such as being capable of high-speed calculation with a small amount of memory usage and being suitable for lossless (reversible) compression.
[0003]
In general, the DWT can be configured using a filter bank that divides an original signal into a high frequency component (high frequency component) and a low frequency component (low frequency component). The inverse transform (inverse DWT) can be configured using a filter bank that synthesizes the band-divided high-frequency component and low-frequency component.
[0004]
FIG. 35 schematically shows

filter banks

200S and 200A used in DWT and its inverse transformation (inverse DWT). The decomposition-side filter bank 200S that decomposes the input signal x (n) into two bands of a low-frequency component and a high-frequency component includes a low-pass filter 201L that passes the low-frequency component, a high-pass filter 201H that passes the high-frequency component, The first and second down

samplers

202 and 203 are configured. The low-pass filter 201L and the high-pass filter 201H are configured by FIR filters that perform a convolution operation. Also, the first and second down

samplers

202 and 203 respectively thin out the input signals from the

filters

201L and 201H every other point and output the signals by halving the signal length. According to the JPEG2000 standard, the first down sampler 202 thins out odd-numbered signals and outputs even-numbered signals (low frequency components), and the second down-sampler 203 thins out even-numbered signals to output odd-numbered signals (high Output).
[0005]
On the other hand, a synthesis-side filter bank 200A that synthesizes input signals (low-frequency components and high-frequency components) includes first and

second upsamplers

204 and 205, a low-pass filter 206L, a high-pass filter 206H, and an adder 207. It is configured. The low-pass filter 206L and the high-pass filter 206H are configured by FIR filters that perform convolution operations. In general, the synthesis-side filters 206L and 206H and the decomposition-

side filters

201L and 201H are configured to satisfy a complete reconstruction condition. The The first and

second upsamplers

204 and 205 insert a zero value between each point and double the signal length for output. The adder 207 adds the signals output from the synthesis filters 206L and 206H and outputs a synthesized signal x ′ (n). Here, when the complete reconstruction condition is satisfied, x (n) = x ′ (n) is satisfied.
[0006]
The two-dimensional DWT can be executed by repeatedly applying the decomposition-side filter bank 200S to the two-dimensional image data in the order of the two-dimensional image data in the vertical direction and the horizontal direction. FIG. 36 is a band division diagram schematically showing two-dimensional image data 210 that has been subjected to DWT at a third-order decomposition level. Each block in the two-dimensional image data 210 represents a subband (band component). For example, the subband HH1 includes a vertical high-frequency component (H) and a horizontal high-frequency component (H) at the decomposition level 1, and the subband LH2 includes a vertical high-frequency component (H) at the decomposition level 2 ( H) and a horizontal low-frequency component (L). In general, the subband XYn (X and Y are either “H” or “L” and n is the order of the decomposition level) is composed of a vertical component Y and a horizontal component X at the decomposition level n. is there.
[0007]
The processing procedure of the decomposition level 3 DWT is as follows. First, by applying the decomposition-side filter bank 200S twice to the entire two-dimensional image, decomposition-level 1 subbands HH1, HL1, LH1, and LL1 (not shown) are generated. Next, decomposition-level 2 subbands HH2, HL2, LH2, and LL2 (not shown) are generated by applying decomposition-side filter bank 200S twice to the lowest-level subband LL1 at decomposition level 1. . Then, the decomposition-side filter bank 200S is applied twice to the lowest-level subband LL2 of decomposition level 2 to generate decomposition level 3 subbands HH3, HL3, LH3, and LL3.
[0008]
Conversely, the processing procedure of inverse DWT for synthesizing subbands at decomposition level 3 is as follows. First, the synthesis-side filter bank 200A is applied twice to the subbands HH3, HL3, LH3, and LL3, thereby generating the lowest-level subband LL2 of decomposition level 2. Next, the synthesis-side filter bank 200A is applied twice to the decomposition level 2 subbands HH2, HL2, LH2, and LL2, thereby generating the lowest band subband LL1 of decomposition level 1. Then, by applying the synthesis-side filter bank 200A twice to the decomposition level 1 subbands HH1, HL1, LH1, and LL1, a two-dimensional image is generated.
[0009]
In the above, an example of the third-order decomposition level has been shown. However, in the JPEG2000 system, in general, third-order to eighth-order or higher decomposition levels are employed. Further, in this example, DWT is applied to one entire still image at once, but in practice, one still image is referred to as a plurality of rectangular “tiles” due to mounting memory capacity and the like. Dividing into areas and executing DWT in units of tiles are also performed.
[0010]
On the other hand, DWT and inverse DWT can also be realized in a lifting configuration. Since the present invention relates to the processing on the composition side, the inverse DWT processing will be described here. In the case of a known 9 × 7 tap Daubechies filter, between input data Y (2n), Y (2n + 1), Y (2n + 2) (n: integer), and output data X (2n), X (2n + 1) This relational expression can be expressed by a lifting configuration defined by the following expression (1). Since the processing on the combining side is inverse DWT, Y is used for input data and X is used for output data throughout the following description.
[0011]
[Expression 1]

[0012]
In the above equation (1), odd-numbered input data Y (2n + 1) indicates high-frequency component data obtained by the decomposition process, and even-numbered input data Y (2n) is a low-frequency component obtained by the decomposition process. The data is shown. The output data X (2n) and X (2n + 1) indicate data obtained by combining the high frequency component and the low frequency component. The coefficients α, β, γ, and δ are called lifting coefficients, the coefficients κ and 1 / κ are called normalization coefficients, and these coefficients α, β, γ, δ, κ, and 1 / κ are It is uniquely derived by the filter coefficient of the 9 × 7 tap Daubechies filter.
[0013]
The lifting configuration defined by the above equation (1) can be expressed by the lattice structure shown in FIG. The grid points arranged in the vertical column at the left end of FIG. 37 respectively represent input data..., Y (2n-1), (2n),..., Y (2n + 9), Y (2n + 10),. ing. That is, it is data in which low-frequency component data and high-frequency component data decomposed by DWT are alternately arranged. Further, the grid point at the right end of the line segment extending rightward in the horizontal direction from these input data is output data..., X (2n-1), X (2n),..., X (2n + 9), X (2n + 10), respectively. , ...
[0014]
A plurality of grid points on a line segment extending from a grid point indicating each input data Y (k) (k: integer) to a grid point indicating the output data X (k) represents a series of intermediate data. Yes. For example, on the line segment between the input data Y (2n) and the output data X (2n), intermediate data S generated starting from the input data Y (2n)¹ _n, S² _nThere is a lattice point representing.
[0015]
The calculation based on the lattice structure is performed according to the following rules (A) to (C). (A) Data representing a lattice point moves along a line segment extending rightward from the lattice point. (B) Data that moves through each line segment is multiplied by a coefficient attached to the line segment (coefficient multiplication process). (C) At each grid point, data moved from the left along the line segment is added (addition process). For example, intermediate data S on a line segment between input / output data Y (2n) and X (2n)² _nS² _n= 1 x S¹ _n−δ × D¹ _n-1−δ × D¹ _nIt is calculated as follows. This equation corresponds to [step 3] in the above equation (1).
[0016]
As shown in FIG. 37, for example, the intermediate data S² _nIs the three grid points D on the left side of the drawing¹ _n-1, S¹ _n, D¹ _nThis is the sum of the data transitioned from. It can be seen that all the intermediate data is calculated by adding the data of the three points shifted from the three grid points on the left side of the intermediate data. The JPEG2000 system recommends that one point of intermediate data calculation processing be performed in two steps ("Mathias Larsson Carlander, Media Lab, Ericsson Research, Sweden, JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165, 28 June, 2001 "). FIG. 38 is a diagram schematically showing a calculation method recommended by the JPEG2000 method. Lattice point x₁, X₂, X_Three, Y represent data, and α, β, γ represent coefficients attached to line segments connecting the lattice points. As shown in the figure, it is understood that the data y is calculated in step b after calculating the temporary data z in step a.
[0017]
[Non-Patent Document 1]
Mathias Larsson Carlander, Media Lab, Ericsson Research Institute, Sweden (Media Lab, Ericsson Research, Sweden), "JPEG2000 Verification Model 9.1 (Technical description) WG1 N2165", June 28, 2001 .
[0018]
[Problems to be solved by the invention]
However, the lifting calculation recommended by the above-described JPEG2000 system has a problem that the processing time required to calculate one point of output data is long as described below.
[0019]
FIG. 39 to FIG. 48 are lattice diagrams for explaining an example of the processing procedure of DWT inverse transform using the lifting configuration. Although not shown, it is assumed that the coefficients shown in FIG. 37 are associated with all the line segments connecting the lattice points. In FIG. 39 to FIG. 48, grid points filled in black represent input or calculated data points, and grid points filled only in the upper half are points of temporary data for which only the processing of step a is completed. The blank grid points represent uncalculated points that have not been processed in either step a or step b. All of the processes shown in these drawings are executed within one clock cycle.
[0020]
In the N-th (N: integer) processing shown in FIG. 39, the input data Y (2n + 4) in the target area N1 is normalized, so that the first stage starting from the even-numbered input data Y (2n + 4). Intermediate data S¹ _{n + 2}Is calculated.
[0021]
In all of the (N + 1) th to N + 4th processes shown in FIGS. 40 to 43, step a is executed. In the N + 1th process (FIG. 40), the intermediate data S of two points in the target area A1.¹ _{n + 2}, D¹ _{n + 1}, The second-stage temporary data (S) starting from the even-numbered input data X (2n + 4).² _{n + 2}(In this way, when temporary data is expressed, the data is enclosed in parentheses to be distinguished). In the next N + 2nd process (FIG. 41), two points of intermediate data D in the target area A2¹ _{n + 1}, S² _{n + 1}Is used as the second stage temporary data (D) starting from the odd-numbered input data Y (2n + 3).² _{n + 1}) Is calculated. In the next N + 3rd process (FIG. 42), intermediate data S of two points in the target area A3.² _{n + 1}, D² _nIs used to calculate temporary output data (X (2n + 2)) on a series starting from even-numbered input data Y (2n + 2). In the N + 4th process (FIG. 43), two points of data D in the target area A4² _n, X (2n) is used to calculate temporary output data (X (2n + 1)) on a series starting from odd-numbered input data Y (2n + 1).
[0022]
In the next N + 5th process (FIG. 44), the input data Y (2n + 5) in the target area N2 is normalized, so that the first stage in the sequence starting from the odd-numbered input data Y (2n + 5) is used. Intermediate data D¹ _{n + 2}Is calculated.
[0023]
Next, in the (N + 6) th to N + 9th processes shown in FIGS. 45 to 48, the above step b is executed. In the N + 6th process (FIG. 45), intermediate data D in the target area B1¹ _{n + 2}And the temporary data (S² _{n + 2}) And the intermediate data S² _{n + 2}Is calculated. In the next N + 7th process (FIG. 46), the intermediate data S calculated in the N + 6th process in the target area B2 is obtained.² _{n + 2}And the temporary data (D² _{n + 1}) And intermediate data D² _{n + 1}Is calculated. In the next N + 8th process (FIG. 47), intermediate data D calculated in the N + 7th process in the target area B3.² _{n + 1}The output data X (2n + 2) is calculated by the process using the temporary output data (X (2n + 2)) calculated in the N + 3th process. In the N + 9th process (FIG. 48), the output data X (2n + 2) calculated in the N + 8th process in the target area B4 and the temporary output data (X (2n)) calculated in the N + 4th process are used. Output data X (2n + 1) is calculated.
[0024]
Next, in the N + 10th process (not shown), the normalization process using the input data Y (2n + 6) is performed as in the Nth process, and thereafter, the same process as the N + 1th to N + 9th processes is performed. Is repeatedly executed.
[0025]
In this way, by inputting the input data Y (2n + 4) and Y (2n + 5) in which the high frequency component and the low frequency component are arranged alternately, the output data X (2n + 2) and X (2n + 1) as the synthesis result are calculated. In order to do this, it can be seen that the Nth to N + 9th 10 clock cycles are required. Therefore, an average of 5 clock cycles is required to calculate one point of output data. There is a need for a processing method that can execute the inverse DWT operation at high speed by further shortening the 5-clock cycle.
[0026]
In view of the above problems and the like, the present invention intends to provide a wavelet transform apparatus and a wavelet transform method that can efficiently execute a wavelet transform based on a lifting configuration in a short time.
[0027]
[Means for Solving the Problems]
In order to solve the above problem, the invention described in claim 1 is a wavelet transform device that synthesizes high-frequency component data and low-frequency component data that have been band-divided based on a lifting configuration, Output data synthesized by taking in an input data sequence formed by alternately arranging a first data sequence composed of one of a high-frequency component and a low-frequency component and a second data sequence composed of the other one in pixel units A filtering unit that calculates a sequence, and the filtering unit multiplies each of the input data sequences by a predetermined normalization coefficient, thereby converting each input data to intermediate data of the first stage by one clock per point. Normalization means for executing one or more normalization processes for conversion within a period, and each of the first stage intermediate data standardized by the normalization means for one or more stages Convert one series of intermediate data to one point within one clock cycle, or execute one or a plurality of conversion processes to convert each of the final stage intermediate data into output data within one clock cycle per point Intermediate data conversion means, and the control unit outputs the singular or plural normalization processes and the singular or plural conversion processes to the normalization means and the intermediate data conversion means at all points of the output. It is repeatedly executed until data is calculated, and at least two of the one or more normalization processes and the one or more conversion processes that are repeatedly executed are executed in parallel within one clock cycle. It is characterized by controlling to.
[0028]
A second aspect of the present invention is the wavelet transform device according to the first aspect, wherein the normalization unit and the intermediate data conversion unit execute the normalization process and the conversion process in parallel.
[0029]
The invention according to claim 3 is the wavelet transform device according to

claim

1 or 2, wherein the normalizing means multiplies each input data by the normalization coefficient, and the standard A delay unit that delays the data output from the conversion coefficient multiplier, and the intermediate data conversion means multiplies one of the two points of intermediate data by a predetermined lifting coefficient, and the lifting coefficient A two-point arithmetic unit comprising an adder that adds the data output from the multiplier and the other of the intermediate data of the two points, and the data output from the two-point arithmetic unit are fetched and designated by the control unit An output destination selection unit for outputting to the output destination, and the wavelet transform device further includes a memory management unit and a memory for temporarily storing data under the control of the memory management unit. When, wherein the memory management unit, the said data output from the output destination selecting unit transferred to the memory controls to store, it is characterized.
[0030]
The invention according to claim 4 is the wavelet transform device according to claim 3, wherein the control unit performs, as the conversion process, “a series starting from input data belonging to the second data string” (hereinafter, referred to as “first series”). The intermediate data of the first stage above and the “series starting from input data belonging to the first data string” one point before the intermediate data (hereinafter referred to as the first series). .) By adding the intermediate data of the first stage to the data obtained by multiplying a predetermined lifting coefficient, the temporary data of the second stage on the second series is added within one clock cycle per point. A first conversion process to be calculated; the temporary data calculated in the first conversion process and stored in the memory; and a first stage on the first series one point after the temporary data series. Multiply intermediate data by a predetermined lifting coefficient The second conversion process for calculating the intermediate data of the second stage on the second series within one clock cycle per point by adding the data obtained in the above, and the first stage on the first series By adding the intermediate data and the data obtained by multiplying the intermediate data of the second stage on the second series one point before the intermediate data by a predetermined lifting coefficient, the intermediate data A third conversion process for calculating second-stage temporary data within one clock cycle per point; the temporary data calculated by the third conversion process and stored in the memory; and a series of the temporary data On the other hand, by adding the intermediate data of the second stage on the second series one point later to the data obtained by multiplying the predetermined lifting coefficient, the intermediate data of the second stage on the first series is added. The fourth variable calculated within one clock cycle per point Processing, intermediate data of the M stage on the second series (stage number M is an integer of 1 or more), and intermediate data of the M stage on the first series one point before the series of the intermediate data A fifth conversion process for calculating the M + 1-th stage temporary data on the second series within one clock cycle by adding the data obtained by multiplying by a predetermined lifting coefficient; 5 by multiplying the temporary data calculated by the conversion process of 5 and stored in the memory with the intermediate data of the M-th stage on the first series one point after the temporary data series by a predetermined lifting coefficient. By adding the obtained data, the sixth conversion process for calculating the M + 1 stage intermediate data on the second series within one clock cycle per point, and the L stage (stage) on the first series The number L is an integer greater than or equal to 1) and the intermediate data The data obtained by multiplying the intermediate data of the (L + 1) th stage on the second series one point before the data series by a predetermined lifting coefficient is added to the (L + 1) th stage on the first series. Of the temporary data in one clock cycle per point, the temporary data calculated in the seventh conversion process and stored in the memory, and 1 for the temporary data series By adding the intermediate data at the (L + 1) th stage on the second series after the point and the data obtained by multiplying the predetermined lifting coefficient, the intermediate data at the (L + 1) th stage on the first series is added to each point. Control is performed so that the second point calculation unit repeatedly executes the eighth conversion processing calculated within one clock cycle until the output data of all points is calculated.
[0031]
The invention according to claim 5 is the wavelet transform device according to claim 4, wherein the control unit executes the first conversion process and the third conversion process, and then executes the fifth conversion process and The seventh conversion process is performed by the two-point calculation unit until the temporary data of the final stage is calculated, and then the second conversion process and the fourth conversion process are performed. Control is performed such that the sixth conversion process and the eighth conversion process are executed by the two-point calculation unit until the output data is calculated.
[0032]
A sixth aspect of the present invention is the wavelet transform device according to the fourth aspect of the present invention, comprising the two two-point arithmetic units that operate independently of each other, and the control unit performs the second data as the conversion process. A second conversion process for calculating second-stage intermediate data on a sequence belonging to a column and starting from P-th input data (data number P is an integer) in the input data sequence; The third conversion process for calculating the second stage temporary data on the series starting from the th input data and the M + 1 stage intermediate data on the series starting from the P-4th input data Each of the two-point arithmetic units includes four steps of the sixth conversion process to be performed and the seventh conversion process to calculate L + 1 stage temporary data on the series starting from the P-5th input data. In parallel with P + 2 The first conversion process for calculating second-stage temporary data on the series starting from data and the second-stage intermediate data on the series starting from the P-1th input data are calculated. The fourth conversion process, the fifth conversion process for calculating the M-th stage temporary data on the series starting from the P-2th input data, and the P-5th input data as the start point Control is performed so that each of the two point calculation units executes the four processes of the eighth conversion process for calculating the intermediate data of the (L + 1) -th stage on the series to be executed.
[0033]
The invention according to claim 7 is the wavelet transform device according to

claim

1 or 2, wherein the normalizing means multiplies each input data by the normalization coefficient, and the standard A delay unit for delaying the data output from the quantization coefficient multiplier, wherein the intermediate data converting means adds a first input data and a second input data among the three input data taken in. An adder, a lifting coefficient multiplier that multiplies the data output from the first adder by a predetermined lifting coefficient, and the data output from the lifting coefficient multiplier and the third input data are added. A three-point arithmetic unit comprising a second adder that calculates intermediate data, and an output destination selection unit that takes in the intermediate data output from the three-point arithmetic unit and outputs it to the output destination designated by the control unit Includes, the memory management unit controls the intermediate data output from the output destination selecting unit so as to transfer stored in said memory.
[0034]
The invention according to claim 8 is the wavelet transform device according to claim 7, wherein the control unit performs, as the conversion process, “a series starting from input data belonging to the second data string” (hereinafter referred to as “first series”). The intermediate data in the first stage above and the “series starting from input data belonging to the first data string” (hereinafter referred to as the first series), which is one point before and after the intermediate data series in the first stage. 2) The intermediate data of the second stage on the second series is added by adding the data obtained by multiplying the above two points of the intermediate data of the first stage to the data obtained by multiplying the predetermined lifting coefficient. Is calculated within one clock cycle per point, the first stage intermediate data on the first series, and the second series around one point relative to the first stage intermediate data series The intermediate data of the second stage of the above two points were added A second conversion process for calculating the intermediate data of the second stage on the first sequence within one clock cycle per point by adding the data obtained by multiplying the data by a predetermined lifting coefficient; The intermediate data of the Mth stage (the number of stages M is an integer of 1 or more) on the second series, and the Mth of two points on the first series that are around one point with respect to the series of intermediate data of the Mth stage By adding the data obtained by multiplying the intermediate data of the stages to the data obtained by multiplying the predetermined lifting coefficient, the intermediate data of the (M + 1) th stage on the second series is calculated within one clock cycle per point. The third conversion process, the intermediate data of the Lth stage (the number of stages L is an integer of 1 or more) on the first series, and the second series on the second series of about one point relative to the intermediate data series of the Lth stage To the data obtained by adding the intermediate data of the (L + 1) -th stage of the two points A fourth conversion process for calculating the intermediate data of the (L + 1) -th stage on the first series within one clock cycle by adding the data obtained by multiplying the footing coefficient is performed at all points. Until the output data is calculated, the three-point calculation unit is repeatedly executed.
[0035]
The invention according to claim 9 is the wavelet transform device according to claim 8, comprising the two three-point arithmetic units operating independently of each other, wherein the control unit belongs to the first data sequence and The second conversion processing for calculating the intermediate data on the series starting from the Pth input data (data number P is an integer) in the input data string, and the P-4th input data as the starting point. Two processes of the fourth conversion process for calculating the intermediate data of the (L + 1) -th stage in the series are controlled so that each of the three-point arithmetic units is executed in parallel.
[0036]
A tenth aspect of the present invention is the wavelet transform device according to the eighth or ninth aspect, wherein the control unit starts the input data of P + 3th (data number P is an integer) in the input data string. The first conversion process for calculating the intermediate data on the series, and the third conversion process for calculating the M + 1 stage intermediate data on the series starting from the (P-1) th input data; The two processes are controlled so that each of the three-point arithmetic units is executed in parallel.
[0037]
The invention according to claim 11 is the wavelet transform device according to claim 8, wherein the control unit controls the first conversion process to the fourth conversion process in parallel.
[0038]
The invention according to claim 12 is the wavelet transform device according to any one of claims 1 to 11, wherein the filtering unit includes a first filtering unit and a second filtering unit connected in series. The first filtering unit inputs the data of the high-frequency component and the low-frequency component that are band-divided in one of the horizontal direction and the vertical direction, and synthesizes these data The second filtering unit performs processing on the combined data calculated by the first filtering unit, so that the combined data in the other direction of the horizontal direction and the vertical direction is processed. Is calculated.
[0039]
A thirteenth aspect of the present invention is a wavelet transform method for synthesizing band-divided high-frequency component data and low-frequency component data based on a lifting configuration, comprising: (a) a high-frequency component and a low-frequency component A step of selectively fetching input data from an input data sequence formed by alternately arranging a first data sequence consisting of one of the second data sequences and a second data sequence consisting of the other; and (b) Converting each of the input data fetched in step (a) by a normalization coefficient to convert it into intermediate data of the first stage within one clock cycle per point; and (c) m-th stage (m Includes a step of calculating intermediate data of 1 or more integers to intermediate data of the (m + 1) th stage within one clock period per point (the case where the intermediate data of the mth stage is the intermediate data of the final stage. In this case, M + 1 The intermediate data of the floor is output data.), And the step (b) and the step (c) are repeatedly executed until the output data of all points are calculated, and are repeatedly executed. The steps (b) and (c) are executed in parallel within one clock cycle.
[0040]
The invention according to a fourteenth aspect is the wavelet transform method according to the thirteenth aspect, wherein the step (c) includes (c-1) “series starting from input data belonging to the second data string” ( The intermediate data of the first stage above and the “series starting from input data belonging to the first data string” one point before the intermediate data (hereinafter referred to as the first series) (hereinafter referred to as the first series). The intermediate data of the first stage above is added to the data obtained by multiplying a predetermined lifting coefficient to add the temporary data of the second stage on the second series to one clock per point. A step of calculating within a cycle; (c-2) the temporary data calculated in step (c-1) and stored in the memory; and the first sequence one point after the temporary data sequence Multiply the intermediate data of the first stage by a predetermined lifting coefficient The step of calculating the intermediate data of the second stage on the second series within one clock cycle per point by adding the obtained data, and (c-3) the first stage on the first series Of the intermediate data and the data obtained by multiplying the intermediate data of the second stage on the second series one point before the intermediate data by a predetermined lifting coefficient, on the first series Calculating the second stage temporary data within one clock cycle per point, (c-4) the temporary data calculated in step (c-3) and stored in the memory, and the temporary data The intermediate data of the second stage on the first series by adding the intermediate data of the second stage on the second series one point later to the data obtained by multiplying the series by a predetermined lifting coefficient. Calculating data within one clock cycle per point; (c 5) Predetermined as intermediate data of the Mth stage (the number of stages M is an integer of 1 or more) on the second series and the intermediate data of the Mth stage on the first series one point before the intermediate data series (C-6) the step of calculating the M + 1-th stage temporary data on the second series within one clock cycle by adding the data obtained by multiplying the lifting coefficients of (c-6) The temporary data calculated in (c-5) and stored in the memory and the intermediate data of the Mth stage on the first series one point after the temporary data series are multiplied by a predetermined lifting coefficient. And (c-7) the L-th stage on the first sequence, by adding the M + 1-th stage temporary data on the second sequence within one clock cycle per point by adding the obtained data The intermediate data (the number of steps L is an integer of 1 or more) and the intermediate data The data obtained by multiplying the intermediate data of the (L + 1) th stage on the second series one point before the data series by a predetermined lifting coefficient is added to the (L + 1) th stage on the first series. (C-8) the temporary data calculated in step (c-7) and stored in the memory, and a series of the temporary data The intermediate data of the (L + 1) th stage on the first series is added to the data obtained by multiplying the intermediate data of the (L + 1) th stage on the second series one point later by the predetermined lifting coefficient, thereby obtaining one point of the intermediate data of the (L + 1) th stage on the first series. And the step (c-1) to step (c-8) are controlled to be repeatedly executed until the output data of all points are calculated.
[0041]
Invention of Claim 15 is the wavelet transformation method of Claim 14, Comprising: After performing the said process (c-1) and the said process (c-3), the said process (c-5) and the said process (C-7) is executed until the temporary data of the output data is calculated, and then after the step (c-2) and the step (c-4) are executed, the step (c-6) is executed. And the step (c-8) is executed until the output data is calculated.
[0042]
The invention described in claim 16 is the wavelet transform method according to claim 14, wherein the input data belonging to the second data string and P-th (data number P is an integer) in the input data string is a starting point. The step (c-2) for calculating the second stage intermediate data on the series to be processed and the step (c-) for calculating the second stage temporary data on the series starting from the P-1th input data. 3), the step (c-6) for calculating the intermediate data of the (M + 1) -th stage on the sequence starting from the P-4th input data, and the sequence starting from the P-5th input data The step (c-7) for calculating the temporary data of the (L + 1) -th stage and the second step on the series starting from the (P + 2) -th input data while causing the two-point arithmetic units to execute the four steps in parallel. The step (c-1) of calculating temporary data of The step (c-4) of calculating the intermediate data in the second stage on the series starting from the P-1th input data, and the M + 1th on the series starting from the P-2th input data The step (c-5) for calculating the temporary data of the stage and the step (c-8) for calculating the intermediate data of the (L + 1) th stage on the series starting from the P-5th input data. Control is performed so that each of the four processes is executed in parallel.
[0043]
The invention according to claim 17 is the wavelet transform method according to claim 13, wherein the step (c) includes: (c-1) "sequence starting from input data belonging to the second data string" ( The intermediate data of the first stage above and the “series starting from the input data belonging to the first data string” (hereinafter referred to as “the first series”), which is one point around the intermediate data series. This is called the first series.) The second stage on the second series is added by adding the data obtained by multiplying the intermediate data of the first stage of the above two points to the predetermined lifting coefficient. (C-2) the first stage of intermediate data on the first series, and the second series of about one point relative to the series of the intermediate data The above two points are added to the intermediate data of the second stage. A step of calculating the intermediate data of the second stage on the first sequence within one clock cycle per point by adding the data obtained by multiplying the lifting coefficients of (c-3) and the second Intermediate data of the Mth stage (number of stages M is an integer equal to or greater than 1) on the series, and the middle of the two Mth stages on the first series around one point relative to the series of the intermediate data of the Mth stage A step of calculating the M + 1 stage intermediate data on the second sequence within one clock cycle per point by adding the data obtained by multiplying the data obtained by multiplying the data by a predetermined lifting coefficient; (C-4) Intermediate data at the Lth stage (the number of stages L is an integer of 1 or more) on the first series and 2 on the second series around one point with respect to the series of intermediate data at the Lth stage A predetermined lifting to the data obtained by adding the intermediate data of the (L + 1) th stage of the point Calculating the intermediate data of the (L + 1) -th stage on the first series within one clock cycle by adding the data obtained by multiplying the coefficients, and including the step (c-1). ) To (c-4) are repeatedly executed until the output data of all points is calculated.
[0044]
The invention according to claim 18 is the wavelet transform method according to claim 17, wherein the input data belonging to the first data string and P-th (data number P is an integer) in the input data string is a starting point. The step (c-2) for calculating the second stage intermediate data on the series to be calculated, and the step (c-2) for calculating the L + 1 stage intermediate data on the series starting from the P-4th input data. -4) and the two processes are controlled in parallel.
[0045]
The invention according to claim 19 is the wavelet transform method according to claim 17 or claim 18, wherein the input data string is a sequence starting from P + 3th input data (data number P is an integer). The process (c-1) for calculating the intermediate data and the process (c-3) for calculating the M + 1 stage intermediate data on the series starting from the P-1th input data. Control to execute the processes in parallel.
[0046]
A twentieth aspect of the present invention is the wavelet transform method according to the seventeenth aspect, wherein the steps (c-1) to (c-4) are executed in parallel.
[0047]
The invention according to claim 21 is the wavelet transform method according to any one of claims 13 to 20, wherein the two-dimensional image data is divided into a low frequency component and a high frequency component. A composite data string is calculated by applying the steps (a) to (c) in units of lines in one of the horizontal direction and the vertical direction of the two-dimensional image data, and the calculated composite data string On the other hand, a wavelet transform method in which the steps (a) to (c) are applied in the other direction of the horizontal direction and the vertical direction.
[0048]
DETAILED DESCRIPTION OF THE INVENTION
<First Embodiment>
The wavelet transform apparatus and wavelet transform method according to the first embodiment of the present invention will be described below. FIG. 1 is a diagram illustrating a schematic configuration of a wavelet transform device 1 according to the first embodiment. The wavelet transform device 1 includes a buffer 8 that temporarily holds high-frequency component or low-frequency component subband data decomposed by wavelet transform, and an MMU (memory management) that operates in synchronization with an externally supplied clock signal CLK. Part) 2, a first ring memory 3A, a horizontal filtering unit 4A, a line buffer circuit 5, a second ring memory 3B, and a vertical filtering unit 4B. Here, the first ring memory 3A, the horizontal filtering unit 4A, the line buffer circuit 5, the second ring memory 3B, and the vertical filtering unit 4B operate in synchronization with the externally supplied pixel clock signal PCLK.
[0049]
In the present embodiment, the MMU 2, the horizontal filtering unit 4A, and the vertical filtering unit 4B are configured by hardware, but instead may be configured by a computer program including an instruction group executed by a microprocessor.
[0050]
The subband data input to the wavelet transform device 1 is temporarily stored in the buffer 8. The wavelet transform device 1 has a function of performing line-based two-dimensional inverse DWT once on subband data. The horizontal filtering unit 4A and the vertical filtering unit 4B are connected in series via the line buffer circuit 5 and the second ring memory 3B. As will be described later, the subband data is filtered in the horizontal direction by the horizontal filtering unit 4A and then filtered in the vertical direction by the vertical filtering unit 4B. When executing the two-dimensional inverse DWT on the data of the second or higher order decomposition level, the wavelet transform device 1 may be repeatedly used twice or more.
[0051]
The MMU 2 has a function of controlling data input / output between the buffer 8, the first ring memory 3A, and the second ring memory 3B, and transfers the input data read from the buffer 8 to the first ring memory 3A for storage. be able to. The horizontal filtering unit 4A performs filtering in the horizontal direction on the data input from the first ring memory 3A, thereby reducing the low frequency in the same direction as the high frequency component in the horizontal direction in the eight clock cycles of the pixel clock signal PCLK. Two points of output data obtained by combining the band components can be calculated and output to the line buffer circuit 5. Therefore, the average period required to calculate one point of output data is 4 clock periods.
[0052]
The data output from the line buffer circuit 5 is stored in the second ring memory 3B. The MMU 2 inputs incoming data from the second ring memory 3B to the vertical filtering unit 4B. The vertical filtering unit 4B performs output filtering on the input data in the vertical direction, so that output data in which the high frequency component in the vertical direction and the low frequency component in the same direction are combined in the eight periods of the pixel clock signal PCLK. 2 points are calculated and output.
[0053]
The configuration of the horizontal filtering unit 4A and the configuration of the vertical filtering unit 4B are the same. FIG. 2 shows a schematic configuration of the filtering unit 4 (horizontal filtering unit 4A or vertical filtering unit 4B). The ring memory 3 shown in FIG. 2 represents one of the first ring memory 3A and the second ring memory 3B shown in FIG.
[0054]
The filtering unit 4 includes a first data selector 11, a first coefficient multiplier 12, a delay register 16, a second data selector 17, a second coefficient multiplier 18, an adder 22, and an output that selectively capture input data. A first selection unit (DMUX) 23 and a control unit 24 are provided. The control unit 24 operates in synchronization with the pixel clock signal PCLK. The first and

second data selectors

11 and 17 receive the input data fetched in the ring memory 3 or the data held in the delay register 16 in accordance with the values of the selection control signals SEL0 and SEL1 supplied from the control unit 24. Are output to the first terminal S0 and the second terminal S1, respectively.
[0055]
The first coefficient multiplier 12 adds the normalized coefficients κ and 1 / κ to the data output from the first terminal S0 of the first data selector 11 according to the control signal C0 supplied from the control unit 24. Multiply any of them and output (normalization processing). The data output from the first coefficient multiplier 12 is input to the second data selector 17 after being delayed by one clock cycle of the pixel clock signal PCLK by the delay register 16. The first coefficient multiplier 12 and the delay register 16 constitute the normalization means of the present invention.
[0056]
In addition, the second coefficient multiplier 18 applies the lifting coefficients −α, −β, − to the data output from the first terminal S 0 of the second data selector 17 in accordance with the control signal C 1 supplied from the control unit 24. Multiply and output either γ or −δ (coefficient multiplication processing). The adder 22 adds the data output from the second coefficient multiplier 18 and the data output from the second terminal S1 of the second data selector 17 and outputs the result to the output destination selection unit 23 (addition processing). ). The data normalized by 1 / κ is also output from the third terminal S2 of the second data selector 17 to the external MMU 2. The MMU 2 can transfer the data output from the third terminal S2 of the second data selector 17 to the outside and store it in the ring memory 3.
[0057]
Further, the output destination selection unit 23 outputs the data input from the adder 22 from any one of the first terminal K0 to the third terminal K2 according to the value of the selection control signal SEL2 supplied from the control unit 24. The coefficient multiplication process and the addition process in the second coefficient multiplier 18 and the adder 22 are executed within one clock cycle per point. Therefore, a period required to multiply and add one point of input data by a lifting coefficient is one cycle of the pixel clock signal PCLK.
[0058]
The coefficient register 19 and the adder 22 constitute a two-point calculation unit that takes in and calculates two points of input data from the data selector 17. The two-point calculation unit and the output destination selection unit 23 constitute intermediate data calculation means.
[0059]
The data output from the first terminal K0 and the second terminal K1 of the output destination selection unit 23 is output to the outside as output data obtained by combining the input data of the low frequency component and the high frequency component.
[0060]
In addition, the data output from the second terminal K1 of the output destination selection unit 23 is branched and output to the external MMU 2. Further, the data output from the third terminal K2 is output to the external MMU2. The MMU 2 can transfer and store data output from the second terminal K1 and the third terminal K2 to the ring memory 3 respectively.
[0061]
Next, a representative example of the lifting operation using the above filtering unit 4 will be described below with reference to FIGS. 3 to 10 are lattice diagrams schematically showing a lifting configuration of a 9 × 7 tap Daubechies filter. The calculation of the lattice diagram is performed in the same manner as in FIG. 3 to 10, for convenience of explanation, lifting coefficients −α, −β, −γ, −δ and normalization coefficients κ, 1 / κ corresponding to line segments connecting the lattice points are displayed. Not.
[0062]
3 to 10, input data..., Y (2n-1), Y (2n),..., Y (2n + 9),. , X (2n−1), X (2n),..., X (2n + 9). For example, the input data Y (2n) is output as output data X (2n) after passing through two stages of intermediate data (lattice points). Hereinafter, the process of generating the intermediate data by normalizing the input data is referred to as the normalization process (corresponding to step 1 and step 2 in the above formula (1)), and the process of calculating the other intermediate data is the conversion process (the above step 3 And step 4 are applicable). In this embodiment and other embodiments described later, intermediate data of two stages is calculated for each series. However, the present invention is not limited to this, and there may be a lifting configuration that calculates intermediate data of only one stage. . In fact, in the case of a 5 × 3 tap or 13 × 7 tap filter, a lifting configuration that calculates intermediate data of only one stage is possible.
[0063]
3 to 10 show the contents of the Nth (N is an integer) to N + 7th processing in this embodiment. In the N-th process (FIG. 3), intermediate data S of two points in the target area A1¹ _{n + 2}And D¹ _{n + 1}The two-point operation in step a (FIG. 38) using the above is executed within one cycle (one clock cycle) of the pixel clock signal PCLK, and the even-numbered input data Y (2n + 4) is used as the starting point. Second-stage temporary data (S² _{n + 2}) Is calculated. That is, even-numbered intermediate data S¹ _{n + 2}And this intermediate data S¹ _{n + 2}Odd-numbered intermediate data D on the series one point before¹ _{n + 1}The process of step a is executed using
[0064]
Further, in the period of one clock before the arithmetic processing in the target area A1, in the target area N1, a normalization process for multiplying the input data Y (2n + 4) by the normalization coefficient κ is performed, and the input data Y (2n + 4) Intermediate data S of the first stage on the series of¹ _{n + 2}Is calculated.
[0065]
The specific contents of the Nth process are as follows. The ring memory 3 shown in FIG. 2 has a 5-line (series) storage area for storing input data, intermediate data, and temporary data, and new data is sequentially stored in the storage area for storing the old data that has been referred to. It has a structure to overwrite.
[0066]
First, the processing in the target area N1 executed in the cycle one clock before will be described. The MMU 2 causes the first data selector 11 to output the input data Y (2n + 4) temporarily stored in the ring memory 3. The control unit 24 supplies the selection control signal SEL0 to the first data selector 11 and causes the first coefficient multiplier 12 to output the input data Y (2n + 4). The first coefficient multiplier 12 outputs the normalized coefficient κ selected according to the control signal C0 supplied from the control unit 24 to the multiplier 14, and the multiplier 14 applies the normalized coefficient κ to the input data Y (2n + 4). Data κ × Y (2n + 4) = S obtained by multiplication¹ _{n + 2}Is output to the delay register 16. The coefficient multiplication process in the first coefficient multiplier 12 is executed within one clock cycle.
[0067]
After one clock cycle, the intermediate data S held in the delay register 16¹ _{n + 2}Is output to the second data selector 17. The MMU 2 also stores intermediate data D temporarily stored in the ring memory 3.¹ _{n + 1}Is output to the first data selector 11. The first data selector 11 receives the intermediate data D from the second terminal S1 in response to the selection control signal SEL0 supplied from the control unit 24.¹ _{n + 1}Is output. The output data is input to the second data selector 17. The second data selector 17 receives the intermediate data D in response to the selection control signal SEL1 supplied from the control unit 24.¹ _{n + 1}From the first terminal S0 to the second coefficient multiplier 18 and the intermediate data S¹ _{n + 2}Is output from the second terminal S1 to the adder 22.
[0068]
The second coefficient multiplier 18 outputs the lifting coefficient δ selected according to the control signal C1 supplied from the control unit 24 to the multiplier 20, and the multiplier 20 outputs the intermediate data D¹ _{n + 1}The data δ × D1n + 1 obtained by multiplying by the lifting coefficient δ is output to the 2's complement arithmetic circuit 21. The two's complement arithmetic circuit 21 is an arithmetic circuit for inverting the sign of the input data, and −δ × D¹ _{n + 1}Is output to the adder 22. Then, the adder 22 has two points of data −δ × D.¹ _{n + 1}And S¹ _{n + 2}Is added to the temporary data (S² _{n + 2}) And output to the output destination selection unit 23. This temporary data (S² _{n + 2}) Is executed within one clock cycle.
[0069]
The output destination selection unit 23 sends temporary data (S) from the third terminal K2 selected according to the value of the selection control signal SEL2 supplied from the control unit 24 to the external MMU2.² _{n + 2}) Is output. The MMU 2 stores the temporary data (S² _{n + 2}) Is transferred to the ring memory 3, and the reference storage area input data Y (2n + 4) is overwritten.
[0070]
In the next N + 1-th process (FIG. 4), intermediate data D of two points in the target area A2¹ _n ₊₁And S² _{n + 1}The two-point operation of step a using the above is executed within one clock cycle, and the second stage temporary data (D) on the series starting from the odd-numbered input data Y (2n + 3)² _{n + 1}) Is calculated. Intermediate data S² _{n + 1}Is the second stage data on the series starting from the input data Y (2n + 2) one point before the input data Y (2n + 3). Specifically, the MMU 2 receives the calculated intermediate data D from the ring memory 3.¹ _{n + 1}And S² _{n + 1}Are output to the first data selector 11. Next, the first data selector 11 controls the intermediate data D under the control of the control unit 24.¹ _{n + 1}And S² _{n + 1}Are output from the second and third terminals S1 and S2 to the second data selector 17, respectively. Further, the second data selector 17 controls the intermediate data S under the control of the control unit 24.² _{n + 1}To the second coefficient multiplier 18 from the first terminal S0 to the intermediate data D¹ _{n + 1}Is output to the adder 22 from the second terminal S1.
[0071]
The second coefficient multiplier 18 outputs the lifting coefficient γ selected according to the control signal C1 supplied from the control unit 24 to the multiplier 20, and the multiplier 20 outputs the intermediate data S² _{n + 1}Is multiplied by the lifting coefficient γ and the data γ × S2n + 1 is output to the 2's complement arithmetic circuit 21. The adder 22 outputs −γ × S which is output data of the two's complement arithmetic circuit 21.² _{n + 1}And D which is an output from the second data selector 17¹ _{n + 1}Is added to the temporary data (D² _{n + 1}) And output to the output destination selection unit 23. The output destination selection unit 23 controls the temporary data (D² _{n + 1}) From the third terminal K2 to the external MMU2, and the MMU2 receives the temporary data (D² _{n + 1}) Is transferred to the ring memory 3, and the reference storage area intermediate data D is transferred.¹ _{n + 1}To overwrite.
[0072]
In the next N + 2nd process (FIG. 5), the intermediate data S of two points in the target area A3.² _{n + 1}And D² _nThe two-point operation of step a using the above is executed within one clock cycle, and the output temporary data (X (2n + 2)) on the series starting from the even-numbered input data Y (2n + 2) is calculated. Intermediate data D² _nIs the intermediate data of the second stage on the series starting from the input data Y (2n + 1) one point before the input data Y (2n + 2). Specifically, the MMU 2 receives the calculated intermediate data S from the ring memory 3.² _{n + 1}And D² _nAre output to the first data selector 11. Next, the first data selector 11 controls the intermediate data S under the control of the control unit 24.² _{n + 1}And D² _nAre output from the second and third terminals S1 and S2 to the second data selector 17, respectively. Further, the second data selector 17 controls the intermediate data D under the control of the control unit 24.² _nTo the second coefficient multiplier 18 from the first terminal S0 to the intermediate data S² _{n + 1}Is output to the adder 22 from the second terminal S1.
[0073]
The second coefficient multiplier 18 generates intermediate data D² _nIs multiplied by the lifting coefficient β, and β × D2n weighted by the lifting coefficient β is output to the two's complement arithmetic circuit 21. The adder 22 outputs −β × D which is output data of the 2's complement arithmetic circuit 21.² _nAnd S which is an output from the second data selector 17² _{n + 1}Is added to calculate temporary output data (X (2n + 2)) and output it to the output destination selector 23. The output destination selection unit 23 outputs the temporary data (X (2n + 2)) from the third terminal K2 to the external MMU 2 under the control of the control unit 24, and the MMU 2 rings the temporary data (X (2n + 2)). Transfer to memory 3 and refer to storage area intermediate data S² _{n + 1}To overwrite.
[0074]
In the next N + 3rd process (FIG. 6), intermediate data D in the target area A4² _nAnd the output data X (2n) are used to execute the two-point operation of step a within one clock cycle, and output temporary data (X () on the sequence starting from the odd-numbered input data Y (2n + 1) 2n + 1)). Specifically, the MMU 2 receives the calculated intermediate data D from the ring memory 3.² _nAnd output data X (2n) are output to the first data selector 11. Next, the first data selector 11 controls the intermediate data D under the control of the control unit 24.² _nAnd X (2n) are output from the second and third terminals S1 and S2 to the second data selector 17, respectively. Further, under the control of the control unit 24, the second data selector 17 transfers X (2n) from the first terminal S0 to the second coefficient multiplier 18 and intermediate data D.² _nIs output to the adder 22 from the second terminal S1.
[0075]
The second coefficient multiplier 18 multiplies X (2n) by the lifting coefficient α and outputs data α × X (2n) weighted by the lifting coefficient α to the two's complement arithmetic circuit 21. The adder 22 outputs −α × X (2n) which is the output of the two's complement arithmetic circuit 21 and D which is the output from the second data selector 17.² _nIs added to calculate temporary output data (X (2n + 1)) and output to the output destination selection unit 23. The output destination selection unit 23 outputs the temporary data (X (2n + 1)) from the third terminal K2 to the external MMU 2 under the control of the control unit 24, and the MMU 2 rings the temporary data (X (2n + 1)). Transfer to memory 3 and refer to storage area intermediate data D² _nTo overwrite.
[0076]
In the next N + 4th process (FIG. 7), the temporary data (S² _{n + 2}) And intermediate data D in the target area B1¹ _{n + 1}The second point intermediate data S on the series starting from the even-numbered input data Y (2n + 4) is executed within one clock cycle by performing the two-point operation of step b (FIG. 38) using² _{n + 2}Is calculated. Intermediate data D¹ _{n + 2}Is temporary data (S² _{n + 2}) In the series one point after.
[0077]
Further, in the period of one clock before executing the arithmetic processing in the target area B1, a normalization process for multiplying the input data Y (2n + 5) by the normalization coefficient 1 / κ is executed in the target area N2. As a result, the intermediate data D of the first stage on the series of the input data Y (2n + 5)¹ _{n + 2}Is calculated.
[0078]
A specific process will be described from a cycle before one clock. In a cycle one clock before the calculation of the target area B1, the MMU 2 causes the first data selector 11 to output the input data Y (2n + 5) temporarily stored in the ring memory 3. The control unit 24 supplies the selection control signal SEL0 to the first data selector 11 and causes the first coefficient multiplier 12 to output the input data Y (2n + 5). The first coefficient multiplier 12 multiplies the input data Y (2n + 5) by the normalization coefficient 1 / κ under the control of the control unit 24, and the obtained data (1 / κ) × Y (2n + 5) = D¹ _{n + 2}Is output to the delay register 16. The coefficient multiplication process in the first coefficient multiplier 12 is executed within one clock cycle.
[0079]
Intermediate data D held in the delay register 16 after one clock cycle¹ _{n + 2}Is output to the second data selector 17. The MMU 2 also stores temporary data (S² _{n + 2}) Is output to the first data selector 11. The control unit 24 supplies the selection control signal SEL0 to the first data selector 11, and the temporary data (S² _{n + 2}) Is output to the second data selector 17.
[0080]
Then, the control unit 24 supplies the selection control signal SEL1 to the second data selector 17, and the intermediate data D¹ _{n + 2}Is output from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (S² _{n + 2}) From the second terminal S1 to the adder 22. Further, the second data selector 17 controls the intermediate data D under the control of the control unit 24.¹ _{n + 2}Is output from the third terminal S2 to the external MMU2, and the MMU2 receives the intermediate data D.¹ _{n + 2}Are transferred to the ring memory 3 and overwritten on the referenced storage area input data Y (2n + 5).
[0081]
The second coefficient multiplier 18 outputs the lifting coefficient δ selected according to the control signal C1 supplied from the control unit 24 to the multiplier 20, and the multiplier 20 outputs the intermediate data D¹ _{n + 2}Is multiplied by the lifting coefficient δ, and data δ × D1n + 2 is output to the 2's complement arithmetic circuit 21. The 2's complement arithmetic circuit 21 outputs the data −δ × D.¹ _{n + 2}Is output to the adder 22. Then, the adder 22 has two points of data −δ × D.¹ _{n + 2}And temporary data (S² _{n + 2}) And the intermediate data S² _{n + 2}Is output to the output destination selection unit 23. This intermediate data S² _{n + 2}Is calculated within one clock cycle.
[0082]
The output destination selection unit 23 sends the intermediate data S from the third terminal K2 selected according to the value of the selection control signal SEL2 supplied from the control unit 24 to the external MMU2.² _{n + 2}Is output. The MMU 2 has the intermediate data S² _{n + 2}Is transferred to the ring memory 3, and the storage area temporary data (S² _{n + 2}).
[0083]
In the next N + 5th process (FIG. 8), the temporary data (D² _{n + 1}) And intermediate data S in the target area B1 calculated in the N + 4th process (FIG. 7).² _{n + 2}The second point intermediate data D on the series starting from the odd-numbered input data Y (2n + 3) is executed within one clock cycle by performing the two-point operation of step b using² _{n + 1}Is calculated. Intermediate data S² _{n + 2}Is temporary data (D² _{n + 1}) Of the second stage on the series one point after.
[0084]
Specifically, the MMU 2 receives the temporary data (D² _{n + 1}) And intermediate data S² _{n + 2}Are output to the first data selector 11. Next, the first data selector 11 controls the temporary data (D² _{n + 1}) And intermediate data S² _{n + 2}Is output from the second and third terminals S1 and S2 to the second data selector 17. Further, the second data selector 17 controls the intermediate data S under the control of the control unit 24.² _{n + 2}Is output from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (D² _{n + 1}) From the second terminal S1 to the adder 22. The second coefficient multiplier 18 receives the intermediate data S² _{n + 2}Is multiplied by the lifting coefficient γ, and the sign of the coefficient is inverted in the two's complement arithmetic circuit 21. The adder 22 is an intermediate data weighted with a lifting coefficient −γ−γ × S.² _{n + 2}And temporary data (D² _{n + 1}) And intermediate data D² _{n + 1}Is output to the output destination selection unit 23. The output destination selection unit 23 controls the intermediate data D under the control of the control unit 24.² _{n + 1}Is output from the third terminal K2 to the external MMU2, and the MMU2 outputs the intermediate data D² _{n + 1}Is transferred to the ring memory 3 and the storage area temporary data (D² _{n + 1}).
[0085]
In the next N + 6th process (FIG. 9), the temporary output data (X (2n + 2)) calculated in the N + 2th process (FIG. 5) and the intermediate in the target area B2 calculated in the N + 5th process (FIG. 8). Data D² _{n + 1}The two-point operation of step b using the above is executed within one clock cycle, and the output data X (2n + 2) on the series starting from the even-numbered input data Y (2n + 2) is calculated. Intermediate data D² _{n + 1}Is intermediate data of the second stage on the series one point after the series of temporary output data (X (2n + 2)).
[0086]
Specifically, the MMU 2 receives the temporary data (X (2n + 2)) and the intermediate data D from the ring memory 3.² _{n + 1}Are output to the first data selector 11. Next, the first data selector 11 controls the temporary data (X (2n + 2)) and the intermediate data D under the control of the control unit 24.² _{n + 1}Is output from the second and third terminals S1 and S2 to the second data selector 17. Further, the second data selector 17 controls the intermediate data D under the control of the control unit 24.² _{n + 1}Is output from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (X (2n + 2)) is output from the second terminal S1 to the adder 22. The second coefficient multiplier 18 generates intermediate data D² _{n + 1}Is multiplied by the lifting coefficient β, and the sign of the coefficient is inverted in the two's complement arithmetic circuit 21. The adder 22 adds intermediate data −β × D weighted with the lifting coefficient −β.² _{n + 1}And the temporary data (X (2n + 2)) are added to calculate output data X (2n + 2), which is output to the output destination selection unit 23. The output destination selection unit 23 outputs the output data X (2n + 2) from the second terminal K1 to the external and external MMU 2 under the control of the control unit 24. The MMU 2 outputs the output data X (2n + 2) to the ring memory 3 To the reference storage area temporary data (X (2n + 2)).
[0087]
In the next N + 7th process (FIG. 10), the temporary output data (X (2n + 1)) calculated in the N + 3th process (FIG. 6) and the output in the target area B4 calculated in the N + 6th process (FIG. 9). The two-point operation of step b above using data X (2n + 2) is executed within one clock cycle to calculate output data X (2n + 1) on the series starting from odd-numbered input data Y (2n + 1) To do. The output data X (2n + 2) is output data on a series one point after the series of temporary output data (X (2n + 1)).
[0088]
Specifically, the MMU 2 causes the first data selector 11 to output the temporary data (X (2n + 1)) and the output data X (2n + 2) from the ring memory 3. Next, under the control of the control unit 24, the first data selector 11 sends the temporary data (X (2n + 1)) and the output data X (2n + 2) from the second and third terminals S1 and S2 to the second data selector 17. Output to. Further, under the control of the control unit 24, the second data selector 17 outputs the output data X (2n + 2) from the first terminal S0 to the second coefficient multiplier 18, and the temporary data (X (2n + 1)) is second. The data is output from the terminal S1 to the adder 22. The second coefficient multiplier 18 multiplies the output data X (2n + 2) by the lifting coefficient α and inverts the sign of the coefficient in the two's complement arithmetic circuit 21. The adder 22 calculates the output data X (2n + 1) by adding the output data-α × X (2n + 2) weighted with the lifting coefficient −α and the temporary data (X (2n + 1)), and outputs an output destination selection unit. To 23. The output destination selection unit 23 outputs the output data X (2n + 1) from the first terminal K0 to the outside under the control of the control unit 24.
[0089]
In the next N + 8th process (not shown), the same process as the Nth process (FIG. 3) is performed except for the target region. Thereafter, the processes from the (N + 1) th time to the (N + 7) th time are repeated. As described above, the same processing as the N-th processing (FIG. 3) to the N + 7-th processing (FIG. 10) is performed for the output data at all points, X (2n-1), X (2n),. It is executed while moving the target area until it is calculated.
[0090]
In the present embodiment, as shown in the Nth to N + 3th processes, the two-point operation of step a is performed until the final output temporary data (X (2n + 1)) is calculated, and thereafter As shown in the (N + 4) th to (N + 7) th process, the two-point operation of the above step b for converting all the temporary data calculated in the Nth to (N + 3) th process into intermediate data or output data is performed. Yes.
[0091]
As described above, in the wavelet transform method according to the present embodiment, the process of normalizing input data..., Y (2n), Y (2n + 1),..., And converting the standardized intermediate data into other intermediate data Since the conversion processing to be performed simultaneously in parallel within one clock cycle, the average cycle required to calculate one point of output data can be set to 4 clock cycles, and the output data calculation cycle can be shortened.
[0092]
Next, line-based two-dimensional inverse DWT processing using the wavelet transform apparatus 1 will be described below.
[0093]
As shown in FIG. 11, the subbands (band components) input to the horizontal filtering unit 4A are the subbands 23LL and 23HL, or the subbands 23LH and 23HH. Here, the subband 23LL includes a horizontal low frequency component (L) and a vertical low frequency component (L), and the subband 23HL includes a horizontal high frequency component (H) and a vertical low frequency component (L). The subband 23LH is composed of a horizontal low frequency component (L) and a vertical high frequency component (H), and the subband 23HH is composed of a horizontal high frequency component (H). ) And a high frequency component (H) in the vertical direction.
[0094]
When the subbands (band components) input to the horizontal filtering unit 4A are the subbands 23LL and 23HL, or the subbands 23LH and 23HH, the input data shown in FIGS. n-1), Y (n), Y (n + 1),... are obtained by alternately arranging the horizontal data of the subbands 23LL and 23HL or the horizontal data of the subbands 23LH and 23HH. It is data arranged alternately. Then, by applying horizontal filtering to the input data composed of the subbands 23LL and 23HL, a horizontal synthesis process is performed, and the subband 23L is output. Further, by applying horizontal filtering to the input data composed of the subbands 23LH and 23HH, a horizontal synthesis process is performed, and the subband 23H is output. The output data..., X (n−1), X (n), X (n + 1),... Shown in FIGS. 3 to 10 are one line in the horizontal direction of the subband 23L or the subband 23H. Data are shown.
[0095]
Next, the subbands input by the vertical filtering unit 4B are the subband 23L and the subband 23H, as shown in FIG. In this case, the input data..., Y (n-1), Y (n), Y (n + 1),... Shown in FIGS. This is data obtained by alternately arranging data. Then, by applying vertical filtering to the input data consisting of the

subbands

23L and 23H, the vertical composition processing is performed and the image data 23 is output. The output data shown in FIGS. 3 to 10, X (n−1), X (n), X (n + 1),... Indicate the data of one line in the vertical direction of the image data 23. Yes. The image data 23 is rectangular data having a horizontal pixel number W and a vertical pixel number H.
[0096]
The subbands 23LL, 23HL, 23LH, and 23HH are rectangular data having the number of vertical pixels H / 2 and the number of horizontal pixels W / 2. As schematically shown in FIG. Data sequence arranged in the vertical direction with band 23LL and even-numbered odd-numbered column subband 23HL as one set, or with odd-numbered and even-numbered subband 23LH and odd-numbered and odd-numbered column as one set of subband 23HH_i(2n), Y_i(2n + 1), Y_i(2n + 2)... Is input to the horizontal filtering unit 4A. Input data Y_iThe subscript i in (k) indicates the input data Y_iIt is assumed that the number of the pixel column to which (k) belongs is shown. The number i of the pixel column takes a value of i = 0, 1,..., W−1 (W: the number of horizontal pixels). In the figure, the even-numbered storage area 24L including the subbands 23LL and 23HL and the odd-numbered storage area 24H including the subband 23LH and the subband 23HH are divided into two areas. The memory-like data arrangement is not limited to this.
[0097]
Specifically, the first ring memory 3A and the horizontal filtering unit 4A perform each process including the Nth process (FIG. 3) to the N + 7th process (FIG. 10) on the low frequency side (storage area 24L side). And the high frequency side (storage region 24H side) are alternately switched, and each process is repeatedly executed in units of pixels.
[0098]
For example, after the N-th process (FIG. 3) is executed once for the first pixel row on the memory area 24L side, the N + 1-th process (FIG. 4) is executed once, and further, the N + 2 The first process (FIG. 5) is executed once, and so on. Similarly, the process is performed on the first pixel row on the storage area 24H side, and then performed on the second pixel line on the storage area 24L side, and then the second pixel row on the storage area 24H side. And then executed on the third pixel row on the storage area 24L side, then on the third pixel row on the storage area 24H side, and finally. This is executed for the H / 2th pixel row on the storage area 24L side, and then for the H / 2th pixel row on the storage area 24H side.
[0099]
Note that the first ring memory 3A has input data..., X as schematically shown in FIG._j(K), X_{j + 1}.. Has a storage area 26 that holds data of 5 points (5 pixels) corresponding to (k),..., And can hold the temporary data and intermediate data.
[0100]
As a result, the horizontal filtering unit 4A outputs each horizontal line unit (H / 2 height) of the subband 23L in which the subbands 23LL and 23HL are combined, and the subband in which the subbands 23LH and 23HH are combined. The output of each horizontal line unit (H / 2 height) of the band 23H is alternately and continuously output. The horizontal line of the subband 23L is buffered in the L line buffer 5L in the line buffer circuit 5, and the horizontal line of the subband 23H is buffered in the H line buffer 5H in the line buffer circuit 5. .
[0101]
For example, as a result of the N + 6th processing (FIG. 9) being continuously executed for each of the first to Wth pixels, one line of data X in which 2n + 2th horizontal components are combined.₀(2n + 2), X₂(2n + 2), ..., X_j(2n + 2), ..., X_w _-1(2n + 2) is continuously output and buffered by the L line buffer circuit 5L. Next, as a result of the N + 7th processing (FIG. 10) being continuously performed for each of the first to Wth pixels, 1 line of data X combined with the 2n + 1th horizontal component.₀(2n + 1), X₂(2n + 1), ..., X_j(2n + 1), ..., X_w _-1(2n + 1) is continuously output and buffered by the H line buffer circuit 5H.
[0102]
Under the control of the MMU 2, the line buffer circuit 5 supplies one horizontal line component in the L line buffer 5L and one horizontal line component in the H line buffer 5H alternately to the second ring memory 3B line by line. To do. The data output to the second ring memory 3B is processed by the vertical filtering unit 4B.
[0103]
Specifically, the second ring memory 3B and the vertical filtering unit 4B repeatedly execute the process for each pixel column including the Nth process (FIG. 3) to the N + 7th process (FIG. 10) in units of horizontal lines. For example, the N-th process (FIG. 3) is performed on the first pixel column after being performed on the 0th pixel column, and then performed on the second pixel column. ... Finally, the process is performed on the (W-1) th pixel column. Next, the N + 1-th process (FIG. 4) is performed on the first pixel column after being performed on the 0th pixel column, and is further performed on the second pixel column. ... Finally, the process is performed on the (W-1) th pixel column. In this way, each process is sequentially executed for all the pixel columns. As schematically shown in FIG. 12, the second ring memory 3B has a storage area 24 for holding data of 5 × W points (5 lines) corresponding to the input data string, and the temporary data And intermediate data.
[0104]
As a result, the vertical filtering unit 4B outputs the image data 23 from the data row input in units of horizontal lines.
[0105]
By performing the above processing recursively, it is possible to combine the band components of the arbitrary-order decomposition level and restore the image data. That is, subbands LL (k−1), HL (k−1), LH (k−1), and HH (k−1) at the decomposition level of k−1 order (k is an integer of 2 or more) are converted into wavelets. It is possible to obtain the k-th order subband LL (k) by causing the conversion device 1 to input recursively.
[0106]
As described above, the wavelet transform device 1 according to the present embodiment includes the horizontal filtering unit 4A and the vertical filtering unit 4B having the configuration illustrated in FIG. 2, and thus the output data calculation cycle can be shortened. Therefore, it is possible to perform line-based two-dimensional wavelet transform at high speed in a short time.
[0107]
<Second Embodiment>
Next, a wavelet transform apparatus and a wavelet transform method according to the second embodiment of the present invention will be described. FIG. 14 is a diagram illustrating a schematic configuration of a wavelet transform device 30 according to the second embodiment. The wavelet transform device 30 includes a buffer 34 that temporarily holds sub-band two-dimensional image data, an MMU (memory management unit) 31 that operates in synchronization with an externally supplied clock signal CLK, a first ring memory 32A, a horizontal A filtering unit 33A, a second ring memory 32B, and a vertical filtering unit 33B are provided. Here, the first ring memory 32A, the horizontal filtering unit 33A, the second ring memory 32B, and the vertical filtering unit 33B operate in synchronization with an externally supplied pixel clock signal PCLK.
[0108]
In the figure, the number of pixels or lines of the first and second ring memories 32A and 32B is 8 or 9, but in the second embodiment, the first ring memory 32A is a nine-point ring memory. The second ring memory 32B is a 9-line ring memory.
[0109]
In the present embodiment, the MMU 31, the horizontal filtering unit 33A, and the vertical filtering unit 33B are configured by hardware, but instead may be configured by a computer program including an instruction group executed by a microprocessor.
[0110]
The subband two-dimensional image data input to the wavelet transform device 30 is temporarily stored in the buffer 34. The wavelet transform device 30 has a function of performing line-based two-dimensional inverse DWT once on two-dimensional image data, and synthesizes k + 1-order level subbands 23LL, 23HL, 23LH, and 23HH to produce a k-th order subband. 23LL is generated. The horizontal filtering unit 33A and the vertical filtering unit 33B are connected in series via the second ring memory 32B. The subband data is filtered in the horizontal direction by the horizontal filtering unit 33A and then filtered in the vertical direction by the vertical filtering unit 33B. When executing a two-dimensional inverse DWT for synthesizing subbands of second and higher order decomposition levels, the wavelet transform device 30 may be used repeatedly two or more times.
[0111]
The MMU 31 has a function of controlling data input / output between the buffer 34, the first ring memory 32A, and the second ring memory 32B, and transfers the subband data read from the buffer 34 to the first ring memory 32A. It can be memorized. Specifically, the data in which the horizontal data in the subbands 23LL and 23HL are alternately arranged in units of pixels and the data in which the horizontal data in the subbands 23LH and 23HH are alternately arranged in units of pixels are It is stored in one ring memory 32A. The horizontal filtering unit 33A synthesizes the high frequency component and the low frequency component in one clock cycle of the pixel clock signal PCLK by performing horizontal filtering on the data input from the first ring memory 32A. Data can be calculated point by point and output to the second ring memory 32B. Specifically, data obtained by combining subbands 23LL and 23HL and data obtained by combining subbands 23LH and 23HH are alternately output and stored in second ring memory 32B.
[0112]
Next, the MMU 31 inputs data from the second ring memory 32B to the vertical filtering unit 33B. The vertical filtering unit 33B performs filtering in the vertical direction on the input data, thereby calculating one point at a time by combining the high frequency component and the low frequency component in one clock cycle of the pixel clock signal PCLK. ,Output.
[0113]
The configuration of the horizontal filtering unit 33A and the configuration of the vertical filtering unit 33B are the same. FIG. 15 shows a schematic configuration of the filtering unit 33 (horizontal filtering unit 33A or vertical filtering unit 33B). The ring memory 32 shown in FIG. 15 represents one of the first ring memory 32A and the second ring memory 32B shown in FIG.
[0114]
The filtering unit 33 includes a first data selector 35, a first coefficient multiplier 36, a delay register 40, a second data selector 41, a third data selector 42, and adders 43 and 48 that selectively capture input data. 49, 54, a second coefficient multiplier 44, a third coefficient multiplier 50, an output destination selection unit (DMUX) 55, and a control unit 56. Among these components, a set of two adders 43 and 48 and a second coefficient multiplier 44 constitutes a three-point arithmetic unit that processes three points of data within one clock cycle. A set of two

adders

49 and 54 and a third coefficient multiplier 50 also constitutes a three-point arithmetic unit. The two sets of three-point arithmetic units and the output destination selection unit 55 constitute intermediate data calculation means.
[0115]
The controller 56 operates in synchronization with the pixel clock signal PCLK. The first data selector 35 selectively selects the data fetched by the ring memory 32 from any one of the first terminal S0 to the seventh terminal S6 according to the value of the selection control signal SEL0 supplied from the control unit 56. Output.
[0116]
Data output from the first terminal S 0 of the first data selector 35 is input to the first coefficient multiplier 36. In the first coefficient multiplier 36, the coefficient register 37 outputs one of the normalized coefficients 1 / κ and κ to the multiplier 38 in accordance with the value of the control signal C0 supplied from the control unit 56, and performs multiplication. The unit 38 executes a normalization process for multiplying the input data by the normalization coefficient within one clock cycle. The data output from the first coefficient multiplier 36 is input to the second data selector 41 after being delayed by one clock cycle of the pixel clock signal PCLK by the delay register 40. The first coefficient multiplier 36 and the delay register 40 constitute the normalization means of the present invention.
[0117]
In the three-point operation unit, the adder 43 adds the two points of data output from the first terminal S0 and the second terminal S1 of the third data selector 42 and outputs the result to the second coefficient multiplier 44. . In the second coefficient multiplier 44, the coefficient register 45 multiplies the input data by one of the lifting coefficients δ and α in accordance with the value of the control signal C1 supplied from the control unit 56, and a two's complement arithmetic circuit. After the sign is inverted at 47, it is output to the adder 48. The adder 48 adds the data input from the third terminal S2 of the third data selector 42 and the data input from the second coefficient multiplier 44 and outputs the result to the output destination selection unit 55.
[0118]
The adder 49 adds the two points of data output from the fourth terminal S3 and the fifth terminal S4 of the third data selector 42 and outputs the result to the third coefficient multiplier 50. In the third coefficient multiplier 50, the coefficient register 51 multiplies the input data by one of the lifting coefficients β and γ in accordance with the value of the control signal C2 supplied from the control unit 56, and a two's complement arithmetic circuit. After the sign is inverted at 53, it is output to the adder 54. The adder 54 adds the data input from the sixth terminal S5 of the third data selector 42 and the data input from the third coefficient multiplier 50 and outputs the result to the output destination selection unit 55.
[0119]
Depending on the value of the selection control signal SEL3 supplied from the control unit 56, the output destination selection unit 55 outputs two points of data input in parallel from the

adders

48 and 54 to any of the first terminal K0 to the third terminal K2. Output from.
[0120]
Further, the data output from the second terminal K1 of the output destination selection unit 55 is branched and output to the external MMU2, and the data output from the third terminal K2 is output to the external MMU2. The MMU 2 can transfer the data output from the second terminal K1 and the third terminal K2 to the outside and store them in the ring memory 32.
[0121]
Next, a typical example of the lifting calculation using the above filtering unit 33 will be described below with reference to FIGS. 16 to 19 are lattice diagrams schematically showing a lifting configuration of a 9 × 7 tap Daubechies filter. The calculation of the lattice diagram is performed in the same manner as in FIG. 16 to 19 show the lifting coefficients −α, −β, −γ, −δ and the normalization coefficients κ, 1 / κ corresponding to the line segments connecting the lattice points for convenience of explanation. Not.
[0122]
16 to 19 schematically show the N-th (N is an integer) to N + 3-th processing in this embodiment. In the N-th process (FIG. 16), two conversion processes for the target areas C1 and C2 are simultaneously executed in parallel within one clock cycle. In the target area C2, a multiplication value is calculated by multiplying the data obtained by adding the two output data X (2n) and X (2n + 2) by the lifting coefficient -α, and then the multiplication value and the intermediate data D² _nA three-point operation of adding and is executed. As a result, the output data X (2n + 1) on the series starting from the odd-numbered input data Y (2n + 1) is calculated. Two points of output data X (2n), X (2n + 2) are intermediate data D² _nThis is data on two series that are around one point with respect to the series. Further, in the target area C1, two points of intermediate data S² _{n + 2}, S² _{n + 3}The multiplication value is calculated by multiplying the data obtained by adding the lifting coefficient −γ, and then the multiplication value and the intermediate data D¹ _{n + 2}A three-point operation of adding and is executed. As a result, the intermediate data D in the second stage on the series starting from the even-numbered input data Y (2n + 5).² _{n + 2}Is calculated. Here, intermediate data S of two points² _{n + 2}, S² _{n + 3}Is the intermediate data D¹ _{n + 2}This is data on two series that are around one point with respect to the series.
[0123]
Further, a normalization process for multiplying the input data Y (2n + 8) by the normalization coefficient κ is executed in the target area N1 in the period one clock before the calculation in the target areas C1 and C2, and the input data Y (2n + 8) S which is intermediate data of the first stage on the series of¹ _{n + 4}Is calculated.
[0124]
The specific contents of the Nth process are as follows. The ring memory 32 shown in FIG. 15 has a 9-line (series) storage area for storing input data, intermediate data, and temporary data, and new data is sequentially stored in the storage area for storing the old data that has been referred to. Has a structure to overwrite.
[0125]
The MMU 31 causes the first data selector 35 to output the input data Y (2n + 8) temporarily stored in the ring memory 32. The control unit 56 supplies the selection control signal SEL0 to the first data selector 35, and causes the first coefficient multiplier 36 to output the input data Y (2n + 8). The first coefficient multiplier 36 selects the second half coefficient κ of the two normalized coefficients κ and 1 / κ according to the control signal C 0 supplied from the control unit 56, and supplies the selected coefficient κ to the multiplier 38. Is a multiplication value obtained by multiplying the input data and the normalization coefficient κ (= κ × Y (2n + 8) = S¹ _{n + 4}) Is output to the delay register 40. The coefficient multiplication process in the first coefficient multiplier 36 is executed within one clock cycle.
[0126]
One clock cycle after the coefficient multiplication process, the intermediate data S1n + 4 stored in the delay register 40 is output to the second data selector 41. The second data selector 41 receives the intermediate data S in accordance with the selection control signal SEL1 supplied from the control unit 56.¹ _{n + 4}Is output from the second terminal S1 to the MMU 31, and the MMU 31 outputs the intermediate data S¹ _{n + 4}Is transferred to the ring memory 32, and the storage area input data Y (2n + 8) that has been referred to is overwritten. Further, this intermediate data S¹ _{n + 4}In the same clock cycle as the cycle of outputting the data to the MMU 31, the MMU 31 stores the six points of data X (2n), D temporarily stored in the ring memory 32.² _n, X (2n + 2), S² _{n + 2}, D¹ _{n + 2}, S² _{n + 3}Is output to the first data selector 35. The first data selector 35 outputs the six points of data to the second terminal S1 to the seventh terminal S6 according to the value of the selection control signal SEL0 supplied from the control unit 56. This output is then input to the third data selector 42. The third data selector 42 in the target area C2 of the input data according to the value of the selection control signal SEL2 supplied from the control unit 56. Data X (2n), X (2n + 2), D² _nAre respectively output from the first terminal S0 to the third terminal S2, and among the input data, three points of data S in the target area C1.² _{n + 2}, S² _{n + 3,}D¹ _{n + 2}Are output from the fourth terminal S3 to the sixth terminal S5, respectively.
[0127]
The upper adder 43 adds the data obtained by adding the two points of data X (2n) and X (2n + 2) input from the first terminal S0 and the second terminal S1 of the third data selector 42 to the second coefficient multiplier 44. Output to. In the second coefficient multiplier 44, the coefficient register 45 selects and supplies the latter half coefficient α of the two lifting coefficients α and δ to the multiplier 46 in accordance with the control signal C1 supplied from the control unit 56, The multiplier 46 outputs a multiplication value (= α × (X (2n) + X (2n + 2))) obtained by multiplying the input data and the lifting coefficient α to the two's complement arithmetic circuit 47. In the two's complement arithmetic circuit 47, the data whose sign is inverted is output to the adder 48. The adder 48 receives the multiplication value input from the second coefficient multiplier 44 and the intermediate data D input from the third terminal S2 of the third data selector 42.² _nAnd the output data X (2n + 1) in the target area C2 is calculated and output to the output destination selection unit 55. The calculation process of the output data X (2n + 1) is executed within one clock cycle.
[0128]
On the other hand, the lower adder 50 has two intermediate data S input from the fourth terminal S3 and the fifth terminal S4 of the third data selector 42.² _{n + 2}, S² _{n + 3}Is added to the third coefficient multiplier 50. In the third coefficient multiplier 50, the coefficient register 51 selects and supplies the latter half of the two lifting coefficients β and γ to the multiplier 52 in accordance with the control signal C2 supplied from the control unit 56, The multiplier 52 multiplies the input data by the lifting coefficient γ (= γ × (S² _{n + 2}+ S² _{n + 3})) Is output to the 2's complement arithmetic circuit 53. In the two's complement arithmetic circuit 53, the data whose sign is inverted is output to the adder 54. The adder 54 receives the multiplication value input from the third coefficient multiplier 50 and the intermediate data D input from the sixth terminal S5 of the third data selector 42.¹ _{n + 2}And the intermediate data D in the target area C1² _{n + 2}Is output to the output destination selection unit 55. This intermediate data D² _{n + 2}Is calculated within one clock cycle.
[0129]
The output destination selection unit 55 outputs the output data X (2n + 1) input from the adder 48 from the first terminal K0 and inputs from the other adder 54 in accordance with the value of the selection control signal SEL3 supplied from the control unit 56. Intermediate data D² _{n + 2}To the external MMU 31 from the third terminal K2, and the MMU 31 receives the intermediate data D² _{n + 2}Is transferred to the ring memory 32, and the storage area intermediate data D that has been referred to¹ _{n + 2}To overwrite.
[0130]
Next, the conversion process of the target areas C3 and C4 in the N + 1th process (FIG. 17) is performed. In the target area C3, two points of intermediate data D¹ _{n + 3}, D¹ _{n + 4}The multiplication value is calculated by multiplying the data obtained by adding the lifting coefficient −δ, and then the multiplication value and the intermediate data S¹ _{n + 4}A three-point operation of adding and is executed. As a result, the intermediate data S in the second stage on the series starting from the even-numbered input data Y (2n + 8).² _{n + 4}Is calculated. Here, two points of intermediate data D¹ _{n + 3}, D¹ _{n + 4}Is the intermediate data S¹ _{n + 4}Is one point before and after. In the target area C4, intermediate data D of two points² _{n + 1}, D² _{n + 2}Is multiplied by the lifting coefficient −β to calculate a multiplication value, and then the multiplication value and the intermediate data S are calculated.² _{n + 2}A three-point operation of adding and is executed. As a result, the output data X (2n + 4) on the series starting from the even-numbered input data Y (2n + 4) is calculated. Here, two points of intermediate data D² _{n + 1}, D² _{n + 2}Is the intermediate data S² _{n + 2}This is data on two series that are around one point with respect to the series.
[0131]
In addition, the process in the target area N2 is executed in a cycle one clock before the calculation process in the target areas C3 and C4 is executed. In the target area N2, a normalization process for multiplying the input data Y (2n + 9) by the normalization coefficient 1 / κ is executed, and the intermediate data D¹ _{n + 4}Is output.
[0132]
The specific processing contents of the (N + 1) th time are as follows. First, the processing of the target area N2 executed in the cycle one clock before will be described. The MMU 31 causes the first data selector 35 to output the input data Y (2n + 9) temporarily stored in the ring memory 32. The control unit 56 supplies the selection control signal SEL0 to the first data selector 35, and causes the first coefficient multiplier 36 to output the input data Y (2n + 9). The first coefficient multiplier 36 selects the first half of the two normalized coefficients κ and 1 / κ according to the control signal C0 supplied from the control unit 56, supplies the selected coefficient 1 / κ to the multiplier 38, and performs multiplication. The unit 38 multiplies the input data by the normalization coefficient 1 / κ (= 1 / κ × Y (2n + 9) = D¹ _{n + 4}) Is output to the delay register 40. The coefficient multiplication process in the first coefficient multiplier 36 is executed within one clock cycle.
[0133]
One clock cycle after the coefficient multiplication process, the intermediate data D1n + 4 stored in the delay register 40 is output to the second data selector 41. The second data selector 41 receives the intermediate data D according to the selection control signal SEL1 supplied from the control unit 56.¹ _{n + 4}From the first terminal S0 to the third data selector 42, and the intermediate data D¹ _{n + 4}Is output from the second terminal S1 to the MMU 31, and the MMU 31 outputs the intermediate data D¹ _{n + 4}Is transferred to the ring memory 32, and the storage area input data Y (2n + 9) that has been referred to is overwritten. Next, this intermediate data D¹ _{n + 4}In the same clock cycle as that for outputting to the third data selector 42, the MMU 31 receives the five points of data D temporarily stored in the ring memory 32.² _{n + 1}, S² _{n + 2}, D² _{n + 2}, D¹ _{n + 3}, S¹ _{n + 4}Is output to the first data selector 35. The first data selector 35 outputs the five points of data to the second terminal S1 to the sixth terminal S5 according to the value of the selection control signal SEL0 supplied from the control unit 56. This output is then input to the third data selector 42, and the third data selector 42 outputs the three points of input data D in the target area C3 out of the five points of data.² _{n + 1}, D² _{n + 2}, S² _{n + 2}, And output from the fourth terminal S3 to the sixth terminal S5, out of the five points of data, the two points of data in the target area C4 and the data D inputted from the second data selector 41¹ _{n + 3}, D¹ _{n + 4}, S¹ _{n + 4}Is output from the first terminal S0 to the third terminal S2.
[0134]
The upper adder 43 receives two points of data D input from the first terminal S0 and the second terminal S1 of the third data selector 42.¹ _{n + 3}, D¹ _{n + 4}Is added to the first coefficient multiplier 44. In the first coefficient multiplier 44, the coefficient register 45 selects the first half coefficient δ from the two lifting coefficients α and δ according to the control signal C1 supplied from the control unit 56, and supplies the first coefficient δ to the multiplier 46. The multiplier 46 multiplies the input data by the lifting coefficient δ (= δ × (D¹ _{n + 3}+ D¹ _{n + 4})) Is output to the 2's complement arithmetic circuit 47. In the two's complement arithmetic circuit 47, the data whose sign is inverted is output to the adder 48. The adder 48 receives the multiplication value input from the second coefficient multiplier 44 and the intermediate data S input from the third terminal S2 of the third data selector 42.¹ _{n + 4}Is added to the intermediate data S in the target area C3.² _{n + 4}Is output to the output destination selection unit 55. This intermediate data S² _{n + 4}Is calculated within one clock cycle.
[0135]
On the other hand, the lower adder 49 has two points of intermediate data D inputted from the fourth terminal S3 and the fifth terminal S4 of the third data selector 42.² _{n + 1}, D² _{n + 2}Is added to the third coefficient multiplier 50. In the third coefficient multiplier 50, the coefficient register 51 selects the first half coefficient β of the two lifting coefficients β and γ according to the control signal C2 supplied from the control unit 56, and supplies the selected coefficient β to the multiplier 52. The multiplier 52 multiplies the input data by the lifting coefficient β (= β × (D² _{n + 1}+ D² _{n + 2})) Is output to the 2's complement arithmetic circuit 53. In the two's complement arithmetic circuit 53, the data whose sign is inverted is output to the adder 54. The adder 54 receives the multiplication value input from the third coefficient multiplier 50 and the intermediate data S input from the sixth terminal S5 of the third data selector 42.² _{n + 2}And the output data X (2n + 4) in the target area C4 is calculated and output to the output destination selection unit 55. The calculation process of the output data X (2n + 4) is executed within one clock cycle.
[0136]
The output destination selection unit 55 outputs the output data X (2n + 4) input from the adder 54 from the second terminal K1 and inputs from the other adder 48 according to the value of the selection control signal SEL3 supplied from the control unit 56. Intermediate data S² _{n + 4}Is output from the third terminal K2 to the external MMU 31, and the MMU 31 receives the intermediate data S.² _{n + 4}Is transferred to the ring memory 32 and the storage area intermediate data S that has been referred to is transferred¹ _{n + 4}To overwrite. Further, the output data X (2n + 4) output from the second terminal K1 branches and is also output to the external MMU 31. The MMU 31 transfers the output data X (2n + 4) to the ring memory 32 and stores the data that has been referred to. Area intermediate data S² _{n + 2}To overwrite.
[0137]
Next, the conversion process of the target areas C5 and C6 in the N + 2th process (FIG. 18) is executed. In addition, the normalization process of the target area N3 is executed in the cycle one clock before the calculation process in the target areas C5 and C6 is executed. Here, the target areas C5, C6, and N3 are areas obtained by moving the target areas C1, C2, and N1 of the N-th process (FIG. 16) backward by two series (two points), respectively. In these target areas C5, C6, and N3, processing similar to that in the target areas C1, C2, and N1 is executed. Therefore, in the target area N3, a normalization process for multiplying the even-numbered input data Y (2n + 10) by the normalization coefficient κ is performed, and the intermediate data S¹ _{n + 5}Is calculated. In the target area C5, two points of intermediate data S² _{n + 3}, S² _{n + 4}The multiplication value is calculated by multiplying the data obtained by adding the lifting coefficient −γ, and then the multiplication value and the intermediate data D¹ _{n + 3}A three-point operation of adding and is executed. As a result, second-stage intermediate data D on the series starting from the odd-numbered input data X (2n + 7).² _{n + 3}Is calculated. In the target area C6, a multiplication value is calculated by multiplying data obtained by adding two points of output data X (2n + 2) and X (2n + 4) by a lifting coefficient -α, and then the multiplication value and the intermediate data D² _{n + 1}A three-point operation of adding and is executed. As a result, the output data X (2n + 3) on the series starting from the odd-numbered input data Y (2n + 3) is calculated.
[0138]
Next, the conversion process of the target areas C7 and C8 in the N + 3rd process (FIG. 19) is executed. In addition, the normalization process of the target area N4 is executed in the cycle one clock before the calculation process in the target areas C7 and C8 is executed. Here, the target areas C7, C8, and N4 are areas obtained by moving the target areas C3, C4, and N2 of the N + 1-th process (FIG. 17) backward by two series (two points), respectively. In these target areas C7, C8, and N4, processing similar to that in the target areas C3, C4, and N2 is executed. Therefore, in the target area N4, a normalization process for multiplying the input data Y (2N + 11) by the normalization coefficient 1 / κ is executed, and the intermediate data D¹ _{n + 5}Is calculated. In the target area C7, the odd-numbered intermediate data D of the two points¹ _{n + 4}, D¹ _{n + 5}The multiplication value is calculated by multiplying the data obtained by adding the lifting coefficient −δ, and then the multiplication value and the even-numbered intermediate data S¹ _{n + 5}A three-point operation of adding and is executed. As a result, the intermediate data S of the second stage on the series starting from the even-numbered input data X (2n + 10).² _{n + 5}Is calculated. In the target area C8, intermediate data D of two points² _n ₊₂, D² _{n + 3}Is multiplied by the lifting coefficient -β to calculate a multiplication value, and then the multiplication value and the intermediate data S² _{n + 3}A three-point operation of adding and is executed. As a result, the output data X (2n + 6) on the series starting from the even-numbered input data Y (2n + 6) is calculated.
[0139]
As described above, the same processing as the N-th processing (FIG. 16) and the N + 1-th processing (FIG. 17) is repeatedly executed while moving the target area until the output data of all points is calculated. As a result, the average period required to calculate even-numbered or odd-numbered output data at one point can be set to one clock period, and the output data calculation period can be greatly shortened.
[0140]
Next, line-based two-dimensional inverse DWT processing using the wavelet transform device 30 will be described below.
[0141]
As shown in FIG. 11, the subbands (band components) input to the horizontal filtering unit 33A are subbands 23LL and 23HL, or subbands 23LH and 23HH.
[0142]
The input data shown in FIGS. 16 to 19, Y (n−1), Y (n), Y (n + 1),... Are alternately arranged in the horizontal direction of the subbands 23LL and 23HL. Or data obtained by alternately arranging the horizontal data of the subbands 23LH and 23HH. Then, by applying horizontal filtering to the input data consisting of subbands 23LL and 23HL, subband 23L is output, and by applying horizontal filtering to the input data consisting of subbands 23LH and 23HH, the subbands are output. 23H is output. 16 to 19 are output data of one line in the horizontal direction of the subband 23L or the subband 23H. X (n-1), X (n), X (n + 1),. A data string is shown.
[0143]
Next, the subbands input by the vertical filtering unit 33B are the subband 23L and the subband 23H, as shown in FIG. In this case, the input data..., Y (n-1), Y (n), Y (n + 1),... Shown in FIGS. This is data obtained by alternately arranging data. Then, the image data 23 is output by performing vertical filtering on the input data composed of the subbands 23L and 23H. 16 to 19 indicate output data..., X (n-1), X (n), X (n + 1),... ing. The image data 23 is rectangular data having a horizontal pixel number W and a vertical pixel number H.
[0144]
The subbands 23LL, 23HL, 23LH, and 23HH are rectangular data having the number of vertical pixels H / 2 and the number of horizontal pixels W / 2. As schematically shown in FIG. Data sequence arranged in the vertical direction with band 23LL and even-numbered odd column subband 23HL as one set, or odd-numbered even-numbered subband 23LH and odd-numbered odd-numbered subband 23HH as one set._i(2n), Y_i(2n + 1), Y_i(2n + 2)... Is input to the horizontal filtering unit 33. That is, each pixel row (horizontal data column in the figure) in the storage area 58L is a data string in which the pixels of the horizontal lines of the subbands 23LL and 23HL are alternately arranged, and each pixel input to the storage area 58H. A row (a horizontal data row in the figure) is a data row in which pixels of the horizontal lines of the subbands 23LH and 23HH are alternately arranged. Input data Y_iThe subscript i in (k) indicates the input data Y_iIt is assumed that the number of the pixel column to which (k) belongs is shown. The pixel column number i is i = 0,1. ..., W-1 (W: number of horizontal pixels). In the figure, the even-numbered storage area 58L including the subbands 23LL and 23HL and the odd-numbered storage area 58H including the subband 23LH and the subband 23HH are divided into two areas. The memory-like data arrangement is not limited to this.
[0145]
Specifically, the first ring memory 32A and the horizontal filtering unit 33A perform each process including the Nth process (FIG. 16) to the N + 2th process (FIG. 17) on the low frequency side (storage area 58L side). And the high frequency side (storage region 58H side) are alternately switched, and each process is repeatedly executed for each pixel.
[0146]
For example, after the N-th process (FIG. 16) is performed once for the first pixel row on the memory area 58L side, the N + 1-th process (FIG. 17) is performed once, and the N + 2 The first process (FIG. 18) is executed once, and so on. Similarly, the processing is executed for the first pixel row on the storage area 58H side, and then executed on the second pixel row on the storage area 58L side, and then the second pixel row on the storage area 58H side. And then executed on the third pixel row on the storage area 58L side, then on the third pixel row on the storage area 58H side, and finally. This is executed for the H / 2th pixel row on the storage area 58L side, and then for the H / 2th pixel row on the storage area 58H side.
[0147]
Note that the first ring memory 32A has input data..., X as schematically shown in FIG._j(K), X_{j + 1}.. Has a storage area 59 for holding data of 9 points (9 pixels) corresponding to (k),..., And can hold the temporary data and intermediate data.
[0148]
As a result, the horizontal filtering unit 33A outputs each horizontal line unit (H / 2 height) of the subband 23L in which the subbands 23LL and 23HL are combined, and the subband in which the subbands 23LH and 23HH are combined. The output of each horizontal line unit (H / 2 height) of the band 23H is alternately and continuously output.
[0149]
Then, the data in which the horizontal lines of the subband 23L and the horizontal lines of the subband 23H are alternately arranged is output to the second ring memory 32B as it is as vertical line data, and is processed by the vertical filtering unit 33B.
[0150]
Specifically, the second ring memory 32B and the vertical filtering unit 33B repeatedly execute processing for each pixel column including the N-th process (FIG. 16) to the N + 1-th process (FIG. 17) in units of horizontal lines. For example, the N-th process (FIG. 16) is performed on the first pixel column after being performed on the 0th pixel column, and then is performed on the second pixel column. ... Finally, the process is performed on the (W-1) th pixel column. Next, the N + 1-th process (FIG. 7) is performed on the first pixel column after being performed on the 0th pixel column, and further, is performed on the second pixel column. ... Finally, the process is performed on the (W-1) th pixel column. In this way, each process is sequentially executed for all the pixel columns. As schematically shown in FIG. 20, the second ring memory 32B has a storage area 58 for holding data of 9 × W points (9 lines) corresponding to the input data string. And intermediate data.
[0151]
As a result, the vertical filtering unit 33B outputs the image data 23 from the data row input in units of horizontal lines.
[0152]
By performing the above processing recursively, it is possible to combine the band components of the arbitrary-order decomposition level and restore the image data. That is, the subbands LL (k + 1), HL (k + 1), LH (k + 1), and HH (k + 1) at the decomposition level of k + 1 order (k is an integer) are input recursively to the wavelet transform device 1, and k It is possible to obtain the next subband LL (k).
[0153]
As described above, the wavelet transform device 1 according to the present embodiment includes the horizontal filtering unit 33A and the vertical filtering unit 33B having the configuration illustrated in FIG. 15, and thus the output data calculation cycle can be shortened. Therefore, it is possible to perform line-based two-dimensional wavelet transform at high speed in a short time.
[0154]
And in 2nd Embodiment, the buffer which memorize | stores the output of the horizontal filtering part 33A required in 1st Embodiment is unnecessary. In the first embodiment, the horizontal filtering unit 4A outputs one pixel at four clocks and the vertical filtering unit 4B inputs one pixel at four clocks. However, the horizontal filtering 4A performs N + 6th processing (FIG. In 9) and N + 7th processing (FIG. 10), a vertical line is continuously output, whereas in the vertical filtering unit 4B, after a vertical line is input in Nth processing (FIG. 3), N + 4th processing is performed. Until (FIG. 7), no vertical line is input. For this reason, a buffer was necessary. On the other hand, in the second embodiment, the horizontal filtering unit 33A outputs a vertical line in each processing, and the vertical filtering 33B inputs a vertical line in each processing, so that a buffer is unnecessary.
[0155]
<Third Embodiment>
Next, a wavelet transform apparatus and a wavelet transform method according to the third embodiment of the present invention will be described. The wavelet transform apparatus according to the present embodiment has the same configuration as that of the wavelet transform apparatus 30 (FIG. 14) according to the second embodiment, except for the horizontal filtering unit and the vertical filtering unit. However, in the second embodiment, the first and second ring memories 32A, 32B are 9-point and 9-line ring memories, respectively. In this embodiment, the first and second ring memories 32A , 32B are ring memories of 8 points and 8 lines, respectively.
[0156]
FIG. 22 is a diagram illustrating a schematic configuration of the filtering unit 33 s according to the third embodiment. The filtering unit 33s indicates a horizontal filtering unit or a vertical filtering unit, and the ring memory 32s indicates either the first ring memory 32A or the second ring memory 32B illustrated in FIG.
[0157]
The filtering unit 33s includes first and

second data selectors

60 and 65, a delay register 64, and first to

fifth coefficient multipliers

61, 66, 71, 76, and 81 that selectively receive input data from the ring memory 32s. ,

Adders

70, 75, 80, 85, an output destination selection unit (DMUX) 86, and a control unit 87. Among these components, the set of the second coefficient multiplier 66 and the adder 70 constitutes a two-point arithmetic unit that processes two points of data by the method of step a or step b (FIG. 38). In addition, the set of the third coefficient multiplier 71 and the adder 75, the set of the fourth coefficient multiplier 76 and the adder 80, and the set of the fifth coefficient multiplier 81 and the adder 85 also form a two-point arithmetic unit. is doing. The two-point calculation unit and the output destination selection unit 86 constitute intermediate data calculation means.
[0158]
The controller 87 operates in synchronization with the pixel clock signal PCLK. The first data selector 60 selectively selects the data fetched from the ring memory 32s from any one of the first terminal S0 to the eighth terminal S7 according to the value of the selection control signal SEL0 supplied from the control unit 87. Output.
[0159]
Data output from the first terminal S 0 of the first data selector 60 is input to the first coefficient multiplier 61. The first coefficient multiplier 61 outputs one of the normalized coefficients κ and 1 / κ to the multiplier 63 in accordance with the value of the control signal C0 supplied from the control unit 87. Multiply the data by its normalization factor. Output data from the multiplier 63 is input to the delay register 64. The normalization process in the first coefficient multiplier 61 is executed within one clock cycle. The first coefficient multiplier 61 and the delay register 64 constitute normalization means. The output of the delay register 64 is input to the second data selector 65 and is branched and input to the MMU 31.
[0160]
The second data selector 65 receives the data fetched from the delay register 64 and the first data selector 60 in accordance with the value of the selection control signal SEL1 supplied from the control unit 87 at the first terminal S0 to the eighth terminal S7. Selectively output from either. The second to

fifth coefficient multipliers

66, 71, 76, and 81 are circuits that multiply the input data by lifting coefficients -α, -β, -γ, and -δ, respectively, according to the control signals C1 to C4. Coefficient registers 67, 72, 77 and 82 receive control signals C1 to C4 and output lifting coefficients α, β, γ and δ to multipliers 68, 73, 78 and 83, respectively.

Multipliers

68, 73, 78, and 83 multiply the data input from the output terminals S0, S2, S4, and S6 of the second data selector 65 by the lifting coefficients α, β, γ, and δ, respectively. Two's

complement arithmetic circuits

69, 74, 79, and 84 invert the signs of the output data from the

multipliers

68, 73, 78, and 83, respectively. The

adders

70, 75, 80, 85 are respectively input from the second to

fifth coefficient multipliers

66, 71, 76, 81 and the output terminals S 1, S 3, S 5, S 7 of the second data selector 65. Are added to the data input to the output destination selection unit 86.
[0161]
The output destination selection unit 86 receives four points of data input in parallel from the

adders

70, 75, 80, and 85 in accordance with the value of the selection control signal SEL2 supplied from the control unit 87. Output from terminal K4. Data output from the first terminal K0 and the second terminal K1 is output to the outside as synthesized data. The data branched from the second terminal K1 and the data output from the third terminal K2 to the fifth terminal K4 are input to the MMU 31. The MMU 31 can transfer and store the data output from the second terminal K1 to the fifth terminal K4 to the MMU 31 to the ring memory 32s.
[0162]
Next, a representative example of the lifting calculation using the filtering unit 33s shown in FIG. 22 will be described below with reference to FIGS. The calculation of the lattice diagram is performed in the same manner as in FIG. 23 to 25, for convenience of explanation, lifting coefficients −α, −β, −γ, −δ and normalization coefficients κ and 1 / κ corresponding to the line segments connecting the lattice points are displayed. Not.
[0163]
FIG. 23 shows a lattice diagram when the N-th process (N: integer) is completed, and FIGS. 24 and 25 schematically show the N + 1-th process and the N + 2-th process, respectively. In the N-th process (FIG. 23), the four conversion processes of the target areas A1, A2, B1, and B2 are simultaneously executed in parallel within one clock cycle. In the target area A1, two points of intermediate data D¹ _{n + 2}, S² _{n + 2}The two-point operation of step a (FIG. 38) using the above is executed, and the second-stage temporary data (D) on the series starting from the odd-numbered input data Y (2n + 5)² _{n + 2}) Is calculated. Here, intermediate data S² _{n + 2}Is intermediate data D¹ _{n + 2}This is data on the series one point before the series. In the target area A2, two points of data D² _n, X (2n) is used to calculate the output temporary data (X (2n + 1)) on the series starting from the odd-numbered input data Y (2n + 1). In the target area B1, temporary data (S² _{n + 3}) And intermediate data D calculated by the arithmetic processing one clock cycle before¹ _{n + 3}The second point intermediate data S on the series starting from the even-numbered input data Y (2n + 6) is executed by performing the two-point operation of step b (FIG. 38) using² _{n + 3}Is calculated. Here, intermediate data D¹ _{n + 3}Is temporary data (S² _{n + 3}) In the series one point after. In the target area B2, the temporary output data (X (2n + 2)) and the intermediate data D² _{n + 1}The two-point operation of step b using the above is executed to calculate the output data X (2n + 2) on the series starting from the even-numbered input data Y (2n + 2).
[0164]
In addition, the normalization process of the target area N1 is performed in the period of one clock before the parallel processing in the target areas A1, A2, B1, and B2. In the target area N1, a normalization process for multiplying the input data Y (2n + 7) by the normalization coefficient 1 / κ is executed.
[0165]
The specific contents of the Nth process are as follows. The ring memory 32s has a storage area of 8 lines (series).
In the N-th processing, the arithmetic processing in the target areas A1, A2, B1, and B2 is performed within one clock cycle. The arithmetic processing in the target region N1 is performed one clock cycle before the arithmetic processing. The processing in the cycle one clock before will be described. The MMU 31 outputs the input data Y (2n + 7) temporarily stored in the ring memory 32s to the first data selector 60. The first data selector 60 outputs the input data Y (2n + 7) from the first terminal S0 according to the value of the selection control signal SEL0 from the control unit 87.
[0166]
The input data Y (2n + 7) output from the first terminal S0 is input to the first coefficient multiplier 61. In the first coefficient multiplier 61, the coefficient register 62 outputs the normalized coefficient 1 / κ of the two normalized coefficients κ and 1 / κ to the multiplier 63 according to the control signal C0 supplied from the control unit 87. The multiplier 63 multiplies the input data Y (2n + 7) by the normalization coefficient 1 / κ. As a result, the first coefficient multiplier 61 receives the data D¹ _{n + 3}(= (1 / κ) × Y (2n + 7)) is calculated. The output of the multiplier 63 is input to the delay register 64. The above processing is executed in a cycle one clock before the calculation processing in the target areas A1, A2, B1, and B2 is performed.
[0167]
In the next clock cycle, the MMU 31 stores the seven points of data X (2n), D temporarily stored in the ring memory 32s.² _n, (X (2n + 2)), D² _{n + 1}, S² _{n + 2}, D¹ _{n + 2}, (S² _{n + 3}) Is output to the first data selector 60. The first data selector 60 outputs the seven points of data to the second data selector 65 in accordance with the value of the selection control signal SEL 0 supplied from the controller 87. Further, the data D stored in the delay register 64¹ _{n + 3}Is output to the second data selector 65. The intermediate data D output from the delay register 64¹ _{n + 3}Is branched and output to the external MMU 31, and the MMU 31 stores the intermediate data D¹ _{n + 3}Is transferred to the ring memory 32s and overwritten on the referenced storage area input data Y (2n + 7).
[0168]
The second data selector 65 outputs two points of output data X (2n), D in the target area A2 out of the eight points of data according to the value of the selection control signal SEL1 supplied from the controller 87.² _nIs selected and output to the first terminal S0 and the second terminal S1, and the intermediate data D in the target area B2 is selected.² _{n + 1}And temporary data (X (2n + 2)) are output from the third terminal S2 and the fourth terminal S3, and the intermediate data S in the target area A1.² _{n + 2}And D¹ _{n + 2}Are output from the fifth terminal S4 and the sixth terminal S5, and the intermediate data D in the target area B1.¹ _{n + 3}And temporary data (S² _{n + 3}Are output from the seventh terminal S6 and the eighth terminal S7.
[0169]
In the second coefficient multiplier 66, the coefficient register 67 outputs the lifting coefficient α to the multiplier 68 in accordance with the control signal C1 supplied from the control unit 87, and the multiplier 68 receives the data input from the first terminal S0. Data α × X (2n) obtained by multiplying X (2n) by the lifting coefficient α is output. The output data from the multiplier 68 is inverted in sign in the two's complement arithmetic circuit 69 and output to the adder 70. The adder 70 receives the data −α × X (2n) output from the second coefficient multiplier 66 and the data D input from the second terminal S 1 of the second data selector 65.² _nAnd the temporary data (X (2n + 1)) in the target area A2 is calculated and output to the output destination selection unit 86.
[0170]
In the third coefficient multiplier 71, the coefficient register 72 outputs the lifting coefficient β to the multiplier 73 in accordance with the control signal C2 supplied from the control unit 87, and the multiplier 73 is input from the third terminal S2. Intermediate data D² _{n + 1}Obtained by multiplying the lifting coefficient β by β × D² _{n + 1}Is output. The output of the multiplier 73 is output to the adder 75 after the sign is inverted in the two's complement arithmetic circuit 74. The adder 75 outputs the data −β × D output from the third coefficient multiplier 71.² _{n + 1}And the output temporary data (X (2n + 2)) input from the fourth terminal S3 of the second data selector 65 are added to calculate the output data X (2n + 2) in the target area B2 and select the output destination Output to the unit 86.
[0171]
In the fourth coefficient multiplier 76, the coefficient register 77 outputs the lifting coefficient γ to the multiplier 78 in accordance with the control signal C3 supplied from the control unit 87, and the multiplier 78 receives an input from the fifth terminal S4. Intermediate data S² _{n + 2}Obtained by multiplying the lifting coefficient γ by γ × S² _{n + 2}Is output. The output of the multiplier 78 is output to the adder 80 after the sign is inverted in the two's complement arithmetic circuit 79. The adder 80 outputs the data −γ × S output from the fourth coefficient multiplier 76.² _{n + 2}And data D input from the sixth terminal S5 of the second data selector 65¹ _{n + 2}And the temporary data (D in the target area A1² _{n + 2}) And output to the output destination selection unit 86.
[0172]
In the fifth coefficient multiplier 81, the coefficient register 82 outputs the lifting coefficient δ to the multiplier 83 in accordance with the control signal C4 supplied from the control unit 87, and the multiplier 83 receives an input from the seventh terminal S6. Intermediate data D¹ _{n + 3}Obtained by multiplying the lifting coefficient δ by δ × D¹ _{n + 3}Is output. The output of the multiplier 83 is output to the adder 85 after the sign is inverted in the two's complement arithmetic circuit 84. The adder 85 outputs the data −δ × D output from the fifth coefficient multiplier 81.¹ _{n + 3}And temporary data input from the eighth terminal S7 of the second data selector 65 (S² _{n + 3}) And the intermediate data S of the second stage in the target area B1² _{n + 3}Is output to the output destination selection unit 86.
[0173]
The output destination selection unit 86 outputs the output data X (2n + 2) input from the adder 75 to the outside from the second terminal K1 according to the value of the selection control signal SEL2 supplied from the control unit 87. The output data X (2n + 2) is also output to the MMU 31. Further, the output destination selection unit 86, in accordance with the selection control signal SEL2, outputs the three points of data (X (2n + 1)), (D² _{n + 2}), S² _{n + 3}Are output from the third terminal K2 to the fifth terminal K4 to the MMU 31. The MMU 31 has four points of data (X (2n + 1)), X (2n + 2), (D² _{n + 2}), S² _{n + 3}Is transferred to the ring memory 32s, and the MMU 31 receives the data (X (2n + 1)), X (2n + 2), (D² _{n + 2}), S² _{n + 3}Is transferred to the ring memory 32s, and the storage area D that has been referred to is transferred.² _n, (X (2n + 2)), D¹ _{n + 2}, (S² _{n + 3}).
[0174]
Next, the conversion processes in the target areas A3, A4, B3, and B4 in the N + 1-th process (FIG. 24) are simultaneously performed in parallel. In the target area A3, the intermediate data S calculated by the calculation process one clock cycle before¹ _{n + 4}And intermediate data D¹ _{n + 3}The two-point operation of step a (FIG. 38) using the above is executed, and the second-stage temporary data (S in the sequence starting from the even-numbered input data Y (2n + 8))² _{n + 4}) Is calculated. Here, intermediate data D¹ _{n + 3}Is the intermediate data S¹ _{n + 4}This is data on the series one point before the series. In the target area A4, two points of data S² _{n + 2}, D² _{n + 1}The two-point operation of step a using the above is executed to calculate temporary output data (X (2n + 4)) on the series starting from the even-numbered input data Y (2n + 4). In the target area B3, temporary data (D² _{n + 2}) And intermediate data S² _{n + 3}The second step intermediate data D on the series starting from the odd-numbered input data Y (2n + 5) is executed by performing the two-point operation of step b (FIG. 38) using² _{n + 2}Is calculated. Here, intermediate data S² _{n + 3}Is temporary data (D² _{n + 2}) In the series one point after. In the target area B4, the two-point operation of the above step b using the temporary output data (X (2n + 1)) and the output data X (2n + 2) is executed, and the odd-numbered input data Y (2n + 1) is the starting point. The output data X (2n + 1) on the series is calculated.
[0175]
Further, the normalization process of the target area N2 is performed in the period of one clock before the parallel processing in the target areas A3, A4, B3, and B4. In the target area N2, a normalization process for multiplying the input data Y (2n + 8) by the normalization coefficient κ is executed.
[0176]
Next, the details of the N + 1th process are as follows. The processing in the target area N2 in the cycle one clock before will be described. The MMU 31 outputs the input data Y (2n + 8) temporarily stored in the ring memory 32s to the first data selector 60. The first data selector 60 outputs the input data Y (2n + 8) from the first terminal S0 according to the value of the selection control signal SEL0 from the control unit 87.
[0177]
The input data Y (2n + 8) output from the first terminal S0 is input to the first coefficient multiplier 61. In the first coefficient multiplier 61, the coefficient register 62 outputs the normalized coefficient κ of the two normalized coefficients κ and 1 / κ to the multiplier 63 in accordance with the control signal C0 supplied from the control unit 87, The multiplier 63 multiplies the input data Y (2n + 8) by the normalization coefficient κ. As a result, the first coefficient multiplier 61 receives the data S¹ _{n + 4}(= Κ × Y (2n + 8)) is calculated. The output of the multiplier 63 is input to the delay register 64. The above processing is executed in a cycle one clock before the calculation processing in the target areas A1, A2, B1, and B2 is performed.
[0178]
In the next clock cycle, the MMU 31 stores 7 points of data (X (2n + 1)), X (2n + 2), D temporarily stored in the ring memory 32s.² _{n + 1}, S² _{n + 2}, (D² _{n + 2}), S² _{n + 3}, D¹ _{n + 3}Is output to the first data selector 60. The first data selector 60 outputs the seven points of data to the second data selector 65 in accordance with the value of the selection control signal SEL 0 supplied from the controller 87. The intermediate data S stored in the delay register 64¹ _{n + 4}Is output to the second data selector 65. The intermediate data S output from the delay register 64¹ _{n + 4}Is branched and output to the external MMU 31, and the MMU 31 stores the intermediate data S¹ _{n + 4}Is transferred to the ring memory 32s and overwritten on the referenced storage area input data Y (2n + 8).
[0179]
The second data selector 65, in accordance with the value of the selection control signal SEL1 supplied from the control unit 87, of the 8 points of data, 2 points of input data X (2n + 2), (X (2n + 1) in the target area B4 )) Is selected and output to the first terminal S0 and the second terminal S1, and the intermediate data D in the target area A4 is selected.² _{n + 1}, S² _{n + 2}Are output from the third terminal S2 and the fourth terminal S3, and the intermediate data S in the target area B3 is output.² _{n + 3}And temporary data (D² _{n + 2}) From the fifth terminal S4 and the sixth terminal S5, and intermediate data D in the target area A3.¹ _{n + 3}And S¹ _{n + 4}Are output from the seventh terminal S6 and the eighth terminal S7.
[0180]
In the second coefficient multiplier 66, the coefficient register 67 outputs the lifting coefficient α to the multiplier 66 in accordance with the control signal C1 supplied from the control unit 87, and the multiplier 68 receives the data input from the first terminal S0. Data α × X (2n + 2) obtained by multiplying X (2n + 2) by the lifting coefficient α is output. The output data from the multiplier 68 is inverted in sign in the two's complement arithmetic circuit 69 and output to the adder 70. The adder 70 receives the data −α × X (2n + 2) output from the second coefficient multiplier 66 and the temporary data (X (2n + 1)) input from the second terminal S1 of the second data selector 65. By adding, the output data X (2n + 1) in the target area B4 is calculated and output to the output destination selection unit 86.
[0181]
In the third coefficient multiplier 71, the coefficient register 72 outputs the lifting coefficient β to the multiplier 73 in accordance with the control signal C2 supplied from the control unit 87, and the multiplier 73 is input from the third terminal S2. Intermediate data D² _{n + 1}Obtained by multiplying the lifting coefficient β by β × D² _{n + 1}Is output. The output of the multiplier 73 is output to the adder 75 after the sign is inverted in the two's complement arithmetic circuit 74. The adder 75 outputs the data −β × D output from the third coefficient multiplier 71.² _{n + 1}Intermediate data S input from the fourth terminal S3 of the second data selector 65² _{n + 2}Are added to calculate temporary output data (X (2n + 4)) in the target area A4 and output it to the output destination selector 86.
[0182]
In the fourth coefficient multiplier 76, the coefficient register 77 outputs the lifting coefficient γ to the multiplier 78 in accordance with the control signal C3 supplied from the control unit 87, and the multiplier 78 receives an input from the fifth terminal S4. Intermediate data S² _{n + 3}Obtained by multiplying the lifting coefficient γ by γ × S² _{n + 3}Is output. The output of the multiplier 78 is output to the adder 80 after the sign is inverted in the two's complement arithmetic circuit 79. The adder 80 outputs the data −γ × S output from the fourth coefficient multiplier 76.² _{n + 3}And temporary data input from the sixth terminal S5 of the second data selector 65 (D² _{n + 2}) And the intermediate data D in the target area B3² _{n + 2}Is output to the output destination selection unit 86.
[0183]
In the fifth coefficient multiplier 81, the coefficient register 82 outputs the lifting coefficient δ to the multiplier 83 in accordance with the control signal C4 supplied from the control unit 87, and the multiplier 83 receives an input from the seventh terminal S6. Intermediate data D¹ _{n + 3}Obtained by multiplying the lifting coefficient δ by δ × D¹ _{n + 3}Is output. The output of the multiplier 83 is output to the adder 85 after the sign is inverted in the two's complement arithmetic circuit 84. The adder 85 outputs the data −δ × D output from the fifth coefficient multiplier 81.¹ _{n + 3}Intermediate data S input from the eighth terminal S7 of the second data selector 65¹ _{n + 4}And the intermediate data S of the second stage in the target area A3² _{n + 4}Is output to the output destination selection unit 86.
[0184]
The output destination selector 86 outputs the output data X (2n + 1) input from the adder 70 from the first terminal K0 to the outside according to the value of the selection control signal SEL2 supplied from the controller 87. In addition, the output destination selection unit 86, according to the selection control signal SEL2, outputs three points of data (X (2n + 4)), D input from the

adders

75, 80, 85.² _{n + 2}, (S² _{n + 4}) From the third terminal K2 to the fifth terminal K4 to the MMU 31. The MMU 31 includes three points of data (X (2n + 4)), D output to the outside from the filtering unit 33s.² _{n + 2}, (S² _{n + 4}) Is transferred to the ring memory 32s, and the MMU 31 stores the three points of data (X (2n + 4)), D² _{n + 2}, (S² _{n + 4}) Is transferred to the ring memory 32s and the storage area S that has been referred to is transferred.² _{n + 2}, (D² _{n + 2}), S¹ _{n + 4}To overwrite.
[0185]
Next, the four conversion processes of the target areas A5, A6, B5, and B6 in the N + 2nd process (FIG. 25) are simultaneously executed in parallel within one clock cycle. In addition, the normalization process of the target area N3 is performed in the period of one clock before the parallel processing in the target areas A5, A6, B5, and B6.
[0186]
The target areas A6, B6, A5, B5, and N3 are areas in which the target areas A2, B2, A1, B1, and N1 of the N-th process (FIG. 23) are moved backward by two series (two points), respectively. In these target areas A6, B6, A5, B5, and N3, processes similar to those in the target areas A2, B2, A1, B1, and N1 are executed. As a result, temporary data (X (2n + 3)) is output in the target area A6, output data X (2n + 4) is output in the target area B6, and temporary data (D² _{n + 3}) Is the intermediate data S in the target area B5.² _{n + 4}However, in the target area N3, intermediate data D¹ _{n + 4}Are calculated respectively.
[0187]
Next, in the (N + 3) th process (not shown), in the area where the target areas B4, A4, B3, A3, and N2 of the N + 1th process (FIG. 24) are moved backward by 2 series (2 points), the (N + 1) th process is performed. Processing similar to the processing is performed.
[0188]
As described above, the same processes as the N-th process (FIG. 23) and the N + 1-th process (FIG. 24) are repeatedly executed while moving the target area until all output data is calculated. As a result, the average period required to calculate even-numbered or odd-numbered output data at one point can be set to one clock period, and the output data calculation period can be greatly shortened.
[0189]
Since the wavelet transform apparatus according to the present embodiment includes the horizontal filtering unit and the vertical filtering unit having the configuration illustrated in FIG. 22, the same line-based two-dimensional inverse DWT processing as that in the second embodiment is executed. It is possible. Therefore, the wavelet transform can be performed at a high speed in a very short time.
[0190]
Also in the third embodiment, as described in the second embodiment, the horizontal filtering unit 33s outputs a horizontal line in each processing, and the vertical filtering 33s inputs a pixel column in each processing. The line buffer circuit 5 is not required unlike the wavelet transform device 1 according to the first embodiment. Therefore, it is possible to realize an inexpensive wavelet transform device that operates on a small circuit scale and with low power consumption.
[0191]
<Modification>
FIG. 26 is a diagram showing a schematic configuration of a two-dimensional wavelet transform device 30a according to a modification of the second and third embodiments described above. The wavelet transform device 30a includes a buffer 88 that temporarily holds sub-band two-dimensional image data, an MMU (memory management unit) 89 that operates in synchronization with an externally supplied clock signal CLK, and the first ring memory 32 or 32s. , A

horizontal filtering unit

33 or 33 s, a second ring memory 3, and a vertical filtering unit 4.
[0192]
Here, the second ring memory 3 and the vertical filtering unit 4 have the same configuration as the ring memory 3 and the filtering unit 4 according to the first embodiment. Therefore, the second ring memory 3B and the vertical filtering unit 4B according to this modification can calculate one line of output data in a period of four lines.
[0193]
The first ring memory 32 or 32s and the

horizontal filtering unit

33 or 33s are the ring memory 32 and filtering unit 33 according to the second embodiment, or the ring memory 32s according to the third embodiment and filtering. It has the same configuration as the portion 33s. Therefore, the first ring memory 32 or 32 s and the

horizontal filtering unit

33 or 33 s of this modification can calculate one point of output data in one clock cycle.
[0194]
Therefore, in this modification, the

horizontal filtering unit

33 or 33 s performs processing so as to capture input data from the first ring memory 32 at intervals of 4 clock cycles. Thereby, the line buffer circuit 5 is not required unlike the wavelet transform device 1 (FIG. 1) according to the first embodiment. Therefore, it is possible to realize an inexpensive wavelet transform apparatus with a small circuit size and a small memory usage.
[0195]
In the present modification, the second ring memory 3B and the vertical filtering unit 4B according to the first embodiment are used as the second ring memory and the vertical filtering unit. Instead, the second ring memory and the vertical filtering unit are used. As described in the related art, a configuration in which one point of output data is calculated at an average of 5 clock cycles may be employed. In this case, the

horizontal filtering unit

33 or 33s performs processing so as to capture input data from the first ring memory 32 at intervals of 5 lines. Thereby, it can be set as the structure which does not require the line buffer circuit 5. FIG.
[0196]
<Fourth Embodiment>
Next, a wavelet transform apparatus and a wavelet transform method according to the fourth embodiment of the present invention will be described. FIG. 27 is a diagram illustrating a schematic configuration of a wavelet transform device 90 according to the fourth embodiment. The wavelet transform device 90 includes a buffer 91 that temporarily holds sub-band two-dimensional image data, an MMU (memory management unit) 92 that operates in synchronization with an externally supplied clock signal CLK, a first ring memory 32H, The first horizontal filtering unit 33H, the second ring memory 32L, the second horizontal filtering unit 33L, the third ring memory 93, and the vertical filtering unit 94 are provided. Here, the first ring memory 32H, the first horizontal filtering unit 33H, the second ring memory 32L, the second horizontal filtering unit 33L, the third ring memory 93, and the vertical filtering unit 94 are synchronized with the externally supplied pixel clock signal PCLK. Works.
[0197]
In the present embodiment, the MMU 92, the first horizontal filtering unit 33H, the second horizontal filtering unit 33L, and the vertical filtering unit 94 are configured by hardware, but instead, a computer that includes a group of instructions executed by a microprocessor. It may consist of programs.
[0198]
The wavelet transform device 90 has a function of performing line-based two-dimensional inverse DWT once on two-dimensional image data. The first and second horizontal filtering units 33H and 33L and the vertical filtering unit 94 are connected via a third ring memory 93, respectively.
[0199]
The MMU 92 has a function of controlling data input / output of the buffer 91, the first ring memory 32H, the second ring memory 32L, and the third ring memory 93, and the subband two-dimensional image data read from the buffer 91 is stored in the MMU 92. It can be transferred to and stored in the first ring memory 32H and the second ring memory 32L.
[0200]
Here, the four subband data 23LL, 23HL, 23LH, and 23HH shown in FIG. 11 are input to the buffer 91, and the horizontal pixels of the subbands 23LH and 23HH are alternately input to the first ring memory 32H. The horizontal width W and the vertical height H / 2 of the image data are input to the second ring memory 32L, and the horizontal width W and the vertical of the horizontal pixels of the subbands 23LL and 23HL are alternately arranged. Image data having a height of H / 2 is input.
[0201]
The first horizontal filtering unit 33H performs filtering on the data input from the first ring memory 32H in the horizontal direction of the two-dimensional image, thereby subbands 23LH and 23HH in one clock cycle of the pixel clock signal PCLK. It is possible to calculate the data of the subband 23H, which is the image data obtained by synthesizing the images one by one. The image data Y of the subband 23H calculated in this way_H(M) is transferred to the third ring memory 93.
[0202]
The second horizontal filtering unit 33L performs filtering on the data input from the second ring memory 32L in the horizontal direction of the two-dimensional image, thereby subbands 23LL and 23HL in one clock cycle of the pixel clock signal PCLK. It is possible to calculate the data of the subband 23L, which is image data obtained by synthesizing the images one by one. The image data Y of the subband 23L calculated in this way_L(M) is transferred to the third ring memory 93.
[0203]
The first horizontal filtering unit 33H and the second horizontal filtering unit 33L may adopt the same configuration as the

filtering unit

33 or 33s according to the second or third embodiment.
[0204]
On the other hand, the vertical filtering unit 94 receives the image data Y of the

subbands

23L and 23H from the third ring memory 93._L(M) and Y_H(M) is input, and this image data Y_L(M) and Y_HBy performing filtering in the horizontal direction for each pixel column on the data in which the vertical lines in (m) are alternately arranged, the vertical line data of the image data 23 is obtained in one clock cycle of the pixel clock signal PCLK. Two points can be calculated in the horizontal direction.
[0205]
FIG. 28 shows a schematic configuration of the vertical filtering unit 94 according to the present embodiment. The vertical filtering unit 94 includes a first data selector 95 that selectively captures input data, first and

second coefficient multipliers

96 and 100, delay registers 99 and 103, a second data selector 104, and four preceding stages. Adders 105, 111, 117, 123, third to

sixth coefficient multipliers

106, 112, 118, 124, four

adders

110, 116, 122, 128 in the subsequent stage, an output destination selection unit (DMUX) 129, and A control unit 130 is provided. Among these components, a set of two adders 105 and 110 and a third coefficient multiplier 106 forms three-point arithmetic units because it processes three points of data within one clock cycle. Further, a set of two adders 111 and 116 and a fourth coefficient multiplier 112, a set of two adders 117 and 122 and a fifth coefficient multiplier 118, and two

adders

123 and 128, Each set of the sixth coefficient multiplier 124 also forms a three-point operation unit because it processes three points of data within one clock cycle. The four sets of three-point arithmetic units and the output destination selection unit 129 constitute intermediate data calculation means.
[0206]
The controller 130 operates in synchronization with the pixel clock signal PCLK. The first data selector 95 receives the data (Y) fetched from the third ring memory 93 in accordance with the value of the selection control signal SEL0 supplied from the control unit 130._L(M) and Y_H(M) data in which vertical lines are alternately arranged) is selectively output from any one of the first terminal S0 to the twelfth terminal S11.
[0207]
Data output from the first terminal S 0 or the second terminal S 1 of the first data selector 95 is input to the first coefficient multiplier 96 and the second coefficient multiplier 100. In the first coefficient multiplier 96, the coefficient register 97 outputs the normalized coefficient κ to the multiplier 98 according to the control signal C0 supplied from the control unit 130, and the multiplier 98 outputs the normalized coefficient to the input data. Multiply κ and output the multiplication output to the delay register 99. In the second coefficient multiplier 100, the coefficient register 101 outputs the normalized coefficient 1 / κ to the multiplier 102 in accordance with the control signal C1 supplied from the control unit 130. The multiplier 102 receives the input data Is multiplied by the normalization coefficient 1 / κ, and the multiplication output is output to the delay register 103. The pair of the first coefficient multiplier 96 and the delay register 99 and the pair of the second coefficient multiplier 101 and the delay register 103 constitute the normalization means of the present invention.
[0208]
The data input to the delay register 99 and the delay register 103 is output to the second data selector 104 after being delayed by one clock cycle of the pixel clock signal PCLK. The data input to the delay register 103 is branched and output to the MMU 92.
[0209]
The data output from the third terminal S2 to the twelfth terminal S11 of the first data selector 95 is output to the second data selector 104, and the second data selector 104 is supplied from the control unit 130. In response to the selected control signal SEL1, each data is output to four sets of three-point arithmetic units, and parallel processing is executed in these three-point arithmetic units.
[0210]
The previous stage adder 105 adds the two points of data output from the first terminal S 0 and the second terminal S 1 of the second data selector 104 and outputs the result to the third coefficient multiplier 106. In the third coefficient multiplier 106, the coefficient register 107 outputs the lifting coefficient α to the multiplier 108 according to the control signal C 2 supplied from the control unit 130, and the multiplier 108 receives the data input from the adder 105. Is multiplied by a lifting coefficient α. The multiplication output is inverted in the 2's complement arithmetic circuit 109 and output to the adder 110 at the subsequent stage. Then, the adder 110 at the subsequent stage adds the data input from the third coefficient multiplier 106 and the data input from the third terminal S2 of the second data selector 104 and outputs the result to the output destination selection unit 129.
[0211]
Further, the previous stage adder 111 adds the two points of data output from the fourth terminal S3 and the fifth terminal S4 of the second data selector 104, and outputs the result to the fourth coefficient multiplier 112. In the fourth coefficient multiplier 112, the coefficient register 113 outputs the lifting coefficient β to the multiplier 114 in accordance with the control signal C3 supplied from the control unit 130, and the multiplier 114 receives the data input from the adder 111. Is multiplied by a lifting coefficient β. The multiplication output is inverted in sign by 2's complement arithmetic circuit 115 and output to adder 116 at the subsequent stage. The subsequent adder 116 adds the data input from the fourth coefficient multiplier 112 and the data input from the sixth terminal S5 of the second data selector 104 and outputs the result to the output destination selection unit 129.
[0212]
The adder 117 at the previous stage adds the two points of data output from the seventh terminal S6 and the eighth terminal S7 of the second data selector 104 and outputs the result to the fifth coefficient multiplier 118. In the fifth coefficient multiplier 118, the coefficient register 119 outputs the lifting coefficient γ to the multiplier 120 according to the control signal C 4 supplied from the control unit 130, and the multiplier 120 receives the data input from the adder 117. Is multiplied by a lifting coefficient γ. The multiplication output is inverted in the 2's complement arithmetic circuit 121 and output to the adder 122 in the subsequent stage. The post-stage adder 122 adds the data input from the fifth coefficient multiplier 118 and the data input from the ninth terminal S8 of the second data selector 104, and outputs the result to the output destination selection unit 129.
[0213]
Further, the previous stage adder 123 adds the two points of data output from the tenth terminal S9 and the eleventh terminal S10 of the second data selector 104 and outputs the result to the sixth coefficient multiplier 124. In the sixth coefficient multiplier 124, the coefficient register 125 outputs the lifting coefficient δ to the multiplier 126 according to the control signal C 5 supplied from the control unit 130, and the multiplier 126 receives the data input from the adder 123. Is multiplied by a lifting coefficient δ. The multiplication output is inverted in sign in the 2's complement arithmetic circuit 127 and output to the adder 128 at the subsequent stage. The adder 128 at the subsequent stage adds the data input from the sixth coefficient multiplier 124 and the data input from the twelfth terminal S11 of the second data selector 104 and outputs the result to the output destination selection unit 129.
[0214]
The output destination selection unit 129 receives four points of data input in parallel from the

adders

110, 116, 122, and 128 in the subsequent stage according to the value of the selection control signal SEL2 supplied from the control unit 130. Selectively output from any of the fourth terminals K3.
[0215]
The output destination selection unit 129 outputs the output data X (2k) and X (2k + 1) from the first terminal K0 and the second terminal K1. The data output from the first terminal K0, the third terminal K2, and the fourth terminal K3 of the output destination selection unit 129 is also output to the MMU 92. The MMU 92 can transfer the data output from the first terminal K0, the third terminal K2, and the fourth terminal K3 to the third ring memory 93 and overwrite the referenced storage area.
[0216]
Next, a representative example of the lifting calculation using the vertical filtering unit 94 will be described below with reference to FIGS. 29 to 31 are lattice diagrams schematically showing a lifting configuration of a 9 × 7 tap Daubechies filter. The calculation of the lattice diagram is performed in the same manner as in FIG. 29 to 31 show the lifting coefficients -α, -β, -γ, -δ and the normalization coefficients κ, 1 / κ corresponding to the line segments connecting the lattice points for convenience of explanation. Not.
[0217]
FIGS. 29 to 31 schematically show the N-th (N is an integer) to N + 2-th processing in the present embodiment.
[0218]
In the N-th process (FIG. 29), four conversion processes of the target areas C1, C2, C3, and C4 are simultaneously executed in parallel within one clock cycle.
[0219]
In the target area C1, two points of intermediate data D¹ _{n + 4}, D¹ _{n + 5}The multiplication value is calculated by multiplying the data obtained by adding the lifting coefficient −δ, and then the multiplication value and the intermediate data S¹ _{n + 5}A three-point operation of adding and is executed. As a result, the intermediate data S of the second stage on the series starting from the even-numbered input data Y (2n + 10).² _{n + 5}Is calculated. Here, two points of intermediate data D¹ _{n + 4}, D¹ _{n + 5}Is the intermediate data S¹ _{n + 5}This is data on a series of about one point with respect to the series.
[0220]
Further, in the target area C2, two points of intermediate data S² _{n + 3}, S² _{n + 4}Is multiplied by the lifting coefficient −γ, and then the multiplied value and the intermediate data D are multiplied.¹ _{n + 3}A three-point operation of adding and is executed. As a result, the second stage intermediate data D on the series starting from the odd-numbered input data Y (2n + 7).² _{n + 3}Is calculated. Here, intermediate data S of two points² _{n + 3}, S² _{n + 4}Is intermediate data D¹ _{n + 3}This is data on a series of about one point with respect to the series.
[0221]
In the target area C3, two points of intermediate data D² _{n + 1}, D² _{n + 2}Is multiplied by the lifting coefficient −β to calculate a multiplication value, and then the multiplication value and the intermediate data S are calculated.² _{n + 2}A three-point operation of adding and is executed. As a result, the output data X (2n + 4) on the series starting from the input data Y (2n + 4) is calculated. Here, two points of intermediate data D² _{n + 1}, D² _{n + 2}Is the intermediate data S² _{n + 2}This is data on a series of about one point with respect to the series.
[0222]
In the target area C4, a multiplication value is calculated by multiplying data obtained by adding the even-numbered output data X (2n) and X (2n + 2) by a lifting coefficient -α, and then the multiplication value and the intermediate value Data D² _nA three-point operation of adding and is executed. As a result, the output data X (2n + 1) on the series starting from the input data Y (2n + 1) is calculated. Here, even-numbered two points of input data X (2n), X (2n + 2) are intermediate data D² _nIs one point before and after.
[0223]
Further, the arithmetic processes in the target areas N1 and N2 are executed in parallel in a cycle one clock before the arithmetic processes in the target areas C1 to C4 are executed. In the target area N1, a normalization process for multiplying the input data Y (2n + 10) by the normalization coefficient κ is executed, and the intermediate data S¹ _{n + 5}In the target area N2, a normalization process for multiplying the input data Y (2n + 11) by the normalization coefficient 1 / κ is executed, and the intermediate data D¹ _{n + 5}Is calculated.
[0224]
The specific contents of the Nth process are as follows. In the N-th processing, the arithmetic processing in the target areas C1, C2, C3, and C4 is performed within one clock cycle. However, the arithmetic processing in the target regions N1 and N2 is performed one clock cycle before this arithmetic processing. It is. The processing in the cycle one clock before will be described. The MMU 92 inputs the input data Y (2n + 10) and Y (2n + 11) temporarily stored in the ring memory 93, and receives the input data Y (2n + 10) from the first terminal S0 according to the selection control signal SEL0 supplied from the control unit 130. ) And the input data Y (2n + 11) is output from the second terminal S1.
[0225]
The input data Y (2n + 10) output from the first terminal S0 is input to the first coefficient multiplier 96. In the first coefficient multiplier 96, the coefficient register 97 outputs the normalized coefficient κ to the multiplier 98 according to the control signal C0 supplied from the control unit 130, and the multiplier 98 converts the normalized coefficient into the input data Y (2n + 10). Multiply by κ. As a result, the first coefficient multiplier 96 generates the intermediate data S¹ _{n + 5}(= Κ × Y (2n + 10)) is calculated within one clock cycle.
[0226]
The input data Y (2n + 111) output from the second terminal S1 is input to the second coefficient multiplier 100. In the second coefficient multiplier 100, the coefficient register 101 outputs the normalized coefficient 1 / κ to the multiplier 102 in accordance with the control signal C1 supplied from the control unit 130, and the multiplier 102 standardizes the input data Y (2n + 11). Multiplication factor 1 / κ. As a result, the second coefficient multiplier 100 generates the intermediate data D¹ _{n + 5}(= 1 / κ × Y (2n + 11)) is calculated within one clock cycle.
[0227]
Intermediate data S output from the first and

second coefficient multipliers

96 and 100¹ _{n + 5}, D¹ _{n + 5}Are input to delay

registers

99 and 103, respectively. In the delay registers 99 and 100, the intermediate data S¹ _{n + 5}, D¹ _{n + 5}Is output after being delayed by one clock cycle.
[0228]
After one clock cycle in which the arithmetic processing in the target areas N1 and N2 is performed, the MMU 92 stores 10 points of data X (2n), D temporarily stored in the third ring memory 93.² _n, X (2n + 2), D² _{n + 1}, S² _{n + 2}, D² _{n + 2}, S² _{n + 3}, D¹ _{n + 3}, S² _{n + 4}, D¹ _{n + 4}Is output to the first data selector 95. The first data selector 95 outputs the ten points of data from the third terminal S2 to the twelfth terminal S11 according to the value of the selection control signal SEL0 supplied from the control unit 130. This output data is input to the second data selector 104. Further, intermediate data S stored in the delay registers 96 and 103 is used.¹ _{n + 5}, D¹ _{n + 5}Is input to the second data selector 104. Intermediate data D output from delay register 103¹ _{n + 5}Is branched and output to the external MMU 92, and the MMU 92 receives the intermediate data D¹ _{n + 5}Are transferred to the ring memory 93 and overwritten on the referenced storage area input data Y (2n + 11).
[0229]
In response to the selection control signal SEL1 supplied from the control unit 130, the second data selector 104 selects three points of input data X (2n) and X (2n + 2) in the target area C4 from among the 12 points of data. ), D² _nAre selected and output from the first terminal S0 to the third terminal S2, respectively, and the three points of data D in the target area C3.² _{n + 1}, D² _{n + 2}, S² _{n + 2}Are selected and output from the fourth terminal S3 to the sixth terminal S5, respectively, and the three points of data S in the target area C2 are selected.² _{n + 3}, S² _{n + 4}, D¹ _{n + 3}Are selected and output from the seventh terminal S6 to the ninth terminal S8, respectively, and the three points of data D in the target area C1.¹ _{n + 4}, D¹ _{n + 5}, S¹ _{n + 5}Are selected and output from the tenth terminal S9 to the twelfth terminal S11, respectively.
[0230]
The adder 105 in the previous stage adds the data obtained by adding the two points of data X (2n) and X (2n + 2) in the target area C4 input from the first terminal S0 and the second terminal S1 of the second data selector 104 to the first. This is output to the 3-coefficient multiplier 106. In the third coefficient multiplier 106, the coefficient register 107 supplies the lifting coefficient α to the multiplier 108 in accordance with the control signal C2, and the multiplier 108 multiplies the input data by the lifting coefficient α (= α × (X (2n) + X (2n + 2))) is output. This output data is output to the adder 110 at the subsequent stage after the sign is inverted in the two's complement arithmetic circuit 109. Then, the adder 110 at the subsequent stage includes the multiplication value input from the third coefficient multiplier 106 and the data D input from the third terminal S2 of the second data selector 104.² _nAnd the output data X (2n + 1) in the target area C4 is calculated and output to the output destination selection unit 129.
[0231]
In addition, the adder 111 in the previous stage has two points of data D in the target area C3 input from the fourth terminal S3 and the fifth terminal S4 of the second data selector 104.² _{n + 1}, D² _{n + 2}Is added to the fourth coefficient multiplier 112. In the fourth coefficient multiplier 112, the coefficient register 113 supplies the lifting coefficient β to the multiplier 114 according to the control signal C3, and the multiplier 114 multiplies the input data by the lifting coefficient β (= β × (D² _{n + 1}+ D² _{n + 2})) Is output. The output data is output to the adder 116 at the subsequent stage after the sign is inverted in the two's complement arithmetic circuit 115. Then, the adder 116 at the subsequent stage receives the multiplication value input from the fourth coefficient multiplier 112 and the data S input from the sixth terminal S5 of the second data selector 104.² _{n + 2}Is added to calculate the output data X (2n + 4) in the target area C3 and output it to the output destination selection unit 129.
[0232]
In addition, the adder 117 at the previous stage has two points of data S in the target area C2 input from the seventh terminal S6 and the eighth terminal S7 of the second data selector 104.² _{n + 3}, S² _{n + 4}Is added to the fifth coefficient multiplier 118. In the fifth coefficient multiplier 118, the coefficient register 119 supplies the lifting coefficient γ to the multiplier 120 according to the control signal C4, and the multiplier 120 multiplies the input data by the lifting coefficient γ (= γ × (S² _{n + 3}+ S² _{n + 4})) Is output. This output data is output to the adder 122 in the subsequent stage after the sign is inverted in the two's complement arithmetic circuit 121. Then, the adder 122 at the subsequent stage receives the multiplication value input from the fifth coefficient multiplier 118 and the data D input from the ninth terminal S8 of the second data selector 104.¹ _{n + 3}And the intermediate data D in the target area C2² _{n + 3}Is output to the output destination selection unit 129.
[0233]
In addition, the adder 123 in the previous stage has two points of data D in the target area C1 input from the tenth terminal S9 and the eleventh terminal S10 of the second data selector 104.¹ _{n + 4}, D¹ _{n + 5}Is added to the sixth coefficient multiplier 124. In the sixth coefficient multiplier 124, the coefficient register 125 supplies the lifting coefficient δ to the multiplier 126 according to the control signal C5, and the multiplier 126 multiplies the input data by the lifting coefficient δ (= δ × (D¹ _{n + 4}+ D¹ _{n + 5})) Is output. This output data is output to the adder 128 in the subsequent stage after the sign is inverted in the two's complement arithmetic circuit 127. Then, the adder 128 at the subsequent stage includes the multiplication value input from the sixth coefficient multiplier 124 and the intermediate data S input from the twelfth terminal S11 of the second data selector 104.¹ _n ₊₅Is added to the intermediate data S in the target area C1.² _{n + 5}Is output to the output destination selection unit 129.
[0234]
The output destination selection unit 129 outputs the two points of output data input from the two subsequent adders 110 and 116 from the first terminal K0 and the second terminal K1, respectively, according to the value of the selection control signal SEL2. Further, the output destination selection unit 129 outputs the three points of data input from the three

adders

110, 122, and 128 at the subsequent stage to the MMU 92. The MMU 92 outputs the output intermediate data X (2n + 4), D² _{n + 3}, S² _{n + 5}Is transferred to the third ring memory 93, and the MMU 92 receives the data (2n + 4), D of the three points.² _{n + 3}, S² _{n + 5}Is transferred to the ring memory 93, and the referenced storage area S is transferred.² _{n + 2}, D¹ _{n + 3}, Y (2n + 10).
[0235]
In the next N + 1-th processing (FIG. 30), the conversion processing of the target areas C5, C6, C7, and C8 is performed. In addition, two normalization processes of the target areas N3 and N4 are executed in a cycle one clock before the conversion process of the target areas C5, C6, C7, and C8. The target areas C5, C6, C7, C8, N3, and N4 are areas in which the target areas C1, C2, C3, C4, N1, and N2 of the N-th process (FIG. 29) are moved backward by two series (two points). is there. In these target areas C5, C6, C7, C8, N3, and N4, processing similar to that in the target areas C1, C2, C3, C4, N1, and N2 is executed. Therefore, in the target area C8, output data X (2n + 3) on the series starting from the odd-numbered input data Y (2n + 3) is calculated, and in the target area C7, the even-numbered input data Y (2n + 6) is used as the start point. Output data X (2n + 6) on the series to be calculated, and in the target area C6, second-stage intermediate data D on the series starting from the odd-numbered input data Y (2n + 9).² _{n + 4}In the target area C1, the second-stage intermediate data S on the series starting from the even-numbered input data Y (2n + 12).² _{n + 6}Is calculated. Also, in the target areas N3 and N4, the normalization process for the input data Y (2n + 12) and Y (2n + 13) is executed in the period before one clock.
[0236]
Further, in the N + 2th process (FIG. 31), the conversion process for the target areas C9, C10, C11, and C12 is performed. In addition, two normalization processes of the target areas N5 and N6 are executed in a cycle one clock before the conversion process of the target areas C9, C10, C11, and C12.
[0237]
As described above, the same process as the N-th process (FIG. 29) is repeatedly executed while moving the target area until the output data of all points is calculated. Thereby, the average period required to calculate the output data of the even-numbered and odd-numbered two points can be set to one clock period, and the calculation period of the output data can be greatly shortened.
[0238]
Next, line-based two-dimensional inverse DWT processing using the wavelet transform device 90 will be described below.
[0239]
The data input to the first horizontal filtering unit 33H is the subbands 23LH and 23HH shown in FIG. 11, and the data input to the second horizontal filtering unit 33L is the subbands 23LL and 23HL. Then, from the first and second horizontal filtering units 33H and 33L, subbands 23H (Y_H(M)), 23L (Y_L(M)) is output.
[0240]
The data input to the vertical filtering unit 94 is data Y output from the first and second horizontal filtering units 33H and 33L._H(M), Y_L(M) and these data Y_H(M), Y_LThe vertical line data of (m) are alternately arranged to be input as a pixel column in the horizontal direction. Then, the vertical filtering unit 94 outputs the two-dimensional image data 23.
[0241]
Specifically, the first ring memory 32H and the first horizontal filtering unit 33H output the subband 23H by filtering the data input in units of horizontal lines at one clock cycle per point, and the second ring The memory 32L and the second horizontal filtering unit 33L output the subband 23L by filtering the data input in units of horizontal lines at one clock cycle per point.
[0242]
As the first ring memory 32H and the second ring memory 33L, the ring memory 32s shown in FIG. 22 described in the third embodiment can be used. As shown in FIG._j(K), X_{j + 1}.. Has a storage area 133 that holds data of 8 points (8 pixels) corresponding to (k),..., And can hold the temporary data and intermediate data. Alternatively, the first ring memory 32H and the second ring memory 33L can use the ring memory 32 of FIG. 15 described in the second embodiment, and as shown in FIG._j(K), X_{j + 1}.. Has a storage area 59 for holding data of 9 points (9 pixels) corresponding to (k),..., And can hold the temporary data and intermediate data.
[0243]
Similarly, the first and second horizontal filtering units 33H and 33L use the filtering unit 33s of FIG. 22 described in the third embodiment or the filtering unit 33 of FIG. 15 described in the second embodiment. Can do.
[0244]
The third ring memory 93 and the vertical filtering unit 94 repeatedly execute each process including the Nth process (FIG. 29) and the N + 1th process (FIG. 30) in units of horizontal lines for each pixel column. For example, the N-th process (FIG. 29) is performed on the first pixel column after being performed on the zeroth pixel column, and then is performed on the second pixel column. ... Finally, the process is performed on the (W-1) th pixel column. Thereafter, the N + 1-th processing (FIG. 30) is performed on the first pixel column after being performed on the 0th pixel column, and is further performed on the second pixel column. .. Finally, the process is performed on the W-1th pixel column. In this way, each process is repeatedly executed for all pixel columns.
[0245]
As a result, the vertical filtering unit 94 outputs even-numbered row data and odd-numbered row data in parallel in units of horizontal lines. For example, as a result of continuously executing the N-th process (FIG. 29) on the 0th to (W-1) th pixel columns, the data X in the odd rows of the 2n + 1th horizontal line₀(2n + 1), X₁(2n + 1), ..., X_j(2n + 1), ..., X_W-1(2n + 1) is continuously output. In parallel with this, the data X of the even row of the 2n + 4th horizontal line₀(2n + 4), X₁(2n + 4), ..., X_j(2n + 4), ..., X_W-1(2n + 4) is continuously output.
[0246]
As schematically shown in FIG. 32, the first ring memory 93 has a storage area 132 for holding data of 12 × W points (12 lines) corresponding to the input data string, and the temporary data And intermediate data. The storage area 132 is an aggregate of row areas that hold 12 points of data in the vertical direction. One row area holds input data and intermediate data referred to in one process. For example, in the N-th process (FIG. 29), the data string {X (2n), D² _n, X (2n + 2), D² _{n + 1}, S² _{n + 2}, D² _{n + 2}, S² _{n + 3}, D¹ _{n + 3}, S² _{n + 4}, D¹ _{n + 4}, Y (2n + 10), Y (2n + 11)}, the data string {X (2n), D² _n, X (2n + 2), D² _{n + 1}, X (2n + 4), D² _{n + 2}, S² _{n + 3}, D² _{n + 3}, S² _{n + 4}, D¹ _{n + 4}, S² _{n + 5}, D¹ _{n + 5}} The stored content changes to (Data S² _{n + 2}, D¹ _{n + 3}, Y (2n + 10), Y (2n + 11) are data X (2n + 4), D, respectively.² _{n + 3}, S² _{n + 5}, D¹ _{n + 5}Will be overwritten).
[0247]
By executing the above processing recursively, it is possible to synthesize subbands (band components) of an arbitrary order of decomposition level. That is, four subbands LL (k + 1), HL (k + 1), LH (k + 1), and HH (k + 1) at a decomposition level of k + 1 order (k is an integer of 2 or more) are input to the wavelet transform device 90. , Subband LL (k) at the kth order decomposition level can be obtained, and the original image data is restored from the subband at the kth order decomposition level by recursively executing such processing. It is possible.
[0248]
As described above, in the wavelet transform device 90 and the wavelet transform method according to the present embodiment, four conversion processes for calculating four points of intermediate data and two normalization processes for normalizing the two points of intermediate data are performed. Since parallel execution is performed within one clock cycle, the calculation cycle of output data can be greatly shortened. Therefore, it is possible to execute wavelet transformation at a high speed in a very short time.
[0249]
The wavelet transform device 90 includes first and second horizontal filtering units 33H and 33L that calculate one point of data within one clock cycle, and a vertical filtering unit 94 that calculates two points of data within one clock cycle. Therefore, two points of combined data can be calculated in parallel within one clock cycle. Therefore, it is possible to execute line-based two-dimensional DWT calculation at extremely high speed.
[0250]
<Modification>
FIG. 34 is a diagram showing a schematic configuration of a two-dimensional wavelet transform device 140 according to a modification of the above-described fourth embodiment. The wavelet transform device 140 includes a buffer 91 that temporarily holds sub-band two-dimensional image data, an MMU (memory management unit) 92A that operates in synchronization with an externally supplied clock signal CLK, a first ring memory 93A, a horizontal A filtering unit 94A, a line buffer circuit 141, a second ring memory 93B, and a vertical filtering unit 94B are provided.
[0251]
Here, the horizontal filtering unit 94A and the vertical filtering unit 94B have the same configuration as the configuration of the vertical filtering unit 94 (FIG. 28) according to the fourth embodiment, and perform the lifting operation shown in FIGS. Data is provided and controlled to execute.
[0252]
From the horizontal filtering unit 94A, the data of the

subbands

23H and 23L are alternately output in units of horizontal lines.
[0253]
In the line buffer circuit 141, each of the first line buffer 143 and the second line buffer 144 includes buffers for two horizontal lines. During the period in which the selector 142 stores the two input data in either the first line buffer 143 or the second line buffer 144, the demultiplexer 145 reads the two data stored in the other direction and reads the second data. The data is output to the 2-ring memory 93B.
[0254]
As described above, even with the configuration of this modification example, two points of combined data can be calculated in parallel within one clock cycle, and therefore, line-based two-dimensional DWT calculation can be executed at extremely high speed.
[0255]
【The invention's effect】
As described above, according to the wavelet transform device of the present invention, the process of standardizing each input data and the transform process of converting each intermediate data into other intermediate data or output data on one series are repeatedly executed. Since at least two processes among a plurality of repeatedly executed processes are executed in parallel within one clock cycle, the output data calculation cycle can be shortened, and the inverse wavelet transform can be executed quickly in a short time. It becomes possible.
[0256]
According to the wavelet transform method of the present invention, the step (b) of normalizing the input data and converting it into the first stage intermediate data, and the step of converting the intermediate data into other intermediate data on one series ( c) and the step (d) of converting the intermediate data in the final stage into the output data are repeatedly executed. In order to execute at least two steps among the plurality of steps to be repeatedly executed in parallel within one clock cycle, The cycle for calculating the output data from the input data string can be shortened, and the inverse wavelet transform can be performed quickly in a short time.
[Brief description of the drawings]
FIG. 1 is a diagram showing a schematic configuration of a wavelet transform apparatus according to a first embodiment of the present invention.
FIG. 2 is a schematic configuration diagram of a filtering unit according to the first embodiment.
FIG. 3 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 4 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 5 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 6 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 7 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 8 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 9 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 10 is a diagram schematically showing a lifting calculation process according to the first embodiment.
FIG. 11 is a diagram schematically illustrating a process of synthesizing an image from subbands.
FIG. 12 is a diagram schematically showing two-dimensional image data and a storage area of a ring memory.
FIG. 13 is a diagram schematically showing a storage area of a ring memory.
FIG. 14 is a diagram showing a schematic configuration of a wavelet transform apparatus according to a second embodiment of the present invention.
FIG. 15 is a schematic configuration diagram of a filtering unit according to the second embodiment.
FIG. 16 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 17 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 18 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 19 is a diagram schematically showing a lifting calculation process according to the second embodiment.
FIG. 20 is a diagram schematically showing two-dimensional image data and a storage area of a ring memory.
FIG. 21 is a diagram schematically illustrating a storage area of a ring memory.
FIG. 22 is a diagram illustrating a schematic configuration of a filtering unit according to a third embodiment of the present invention.
FIG. 23 is a diagram schematically showing a lifting calculation process according to the third embodiment.
FIG. 24 is a diagram schematically showing a lifting calculation process according to the third embodiment.
FIG. 25 is a diagram schematically showing a lifting calculation process according to the third embodiment.
FIG. 26 is a diagram showing a schematic configuration of a wavelet transform device according to a modification of the second and third embodiments.
FIG. 27 is a diagram showing a schematic configuration of a wavelet transform apparatus according to a fourth embodiment of the present invention.
FIG. 28 is a schematic configuration diagram of a vertical filtering unit according to the fourth embodiment.
FIG. 29 is a diagram schematically showing a lifting calculation process according to the fourth embodiment.
FIG. 30 is a diagram schematically showing a lifting calculation process according to the fourth embodiment.
FIG. 31 is a diagram schematically showing a lifting calculation process according to the fourth embodiment;
FIG. 32 is a diagram schematically showing two-dimensional image data and a storage area of a ring memory.
FIG. 33 is a diagram schematically showing a storage area of a ring memory.
FIG. 34 is a diagram showing a schematic configuration of a wavelet transform device according to a modification of the fourth embodiment.
FIG. 35 is a diagram schematically showing a filter bank used in DWT and inverse DWT.
FIG. 36 is a diagram schematically showing image data subjected to two-dimensional DWT at a third-order decomposition level.
FIG. 37 is a lattice diagram schematically showing a lifting configuration on the synthesis side.
FIG. 38 is a diagram schematically showing a calculation method recommended by the JPEG2000 system.
FIG. 39 is a diagram schematically showing a lifting calculation process;
FIG. 40 is a diagram schematically illustrating a lifting calculation process.
FIG. 41 is a diagram schematically showing a lifting calculation process.
FIG. 42 is a diagram schematically showing a lifting calculation process.
FIG. 43 is a diagram schematically illustrating a lifting calculation process.
FIG. 44 is a diagram schematically showing a lifting calculation process.
FIG. 45 is a diagram schematically showing a lifting calculation process.
FIG. 46 is a diagram schematically illustrating a lifting calculation process.
FIG. 47 is a diagram schematically showing a lifting calculation process.
FIG. 48 is a diagram schematically showing a lifting calculation process.
[Explanation of symbols]
1 Wavelet transform device
2 MMU (Memory Management Unit)
3A, 3B ring memory
4A, 4B filtering unit
5 Line buffer circuit

Claims

リフティング構成に基づいて、帯域分割された高域成分のデータと低域成分のデータとを合成するウェーブレット変換装置であって、
制御部と、
高域成分および低域成分の一方からなる第１データ列と、その他方からなる第２データ列とが画素単位で交互に配列されて構成される入力データ列を取り込んで合成された出力データ列を算出するフィルタリング部と、
を備え、
前記フィルタリング部は、
前記入力データ列の各々に所定の規格化係数を乗算することで、各入力データを第１段階の中間データへ１点当たり１クロック周期内に変換する単数または複数の規格化処理を実行する規格化手段と、
前記規格化手段によって規格化された第１段階の中間データの各々を単数または複数の段階に亘る一系列の中間データへ１点当たり１クロック周期内に変換し、あるいは、最終段階の中間データの各々を出力データへ１点当たり１クロック周期内に変換する単数または複数の変換処理を実行する中間データ変換手段と、を含み、
前記制御部は、
前記規格化手段および前記中間データ変換手段に、前記単数または複数の規格化処理および前記単数または複数の変換処理を、全ての点の前記出力データが算出されるまで繰り返し実行させ、且つ、繰り返し実行される前記単数または複数の規格化処理および前記単数または複数の変換処理のうち少なくとも２個の処理を１クロック周期内に並列に実行させるように制御する、
ことを特徴とするウェーブレット変換装置。A wavelet transform device for synthesizing band-divided high-frequency component data and low-frequency component data based on a lifting configuration,
A control unit;
An output data sequence obtained by synthesizing an input data sequence configured by alternately arranging a first data sequence composed of one of a high frequency component and a low frequency component and a second data sequence composed of the other one in units of pixels. A filtering unit for calculating
With
The filtering unit includes:
A standard that executes one or more normalization processes for converting each input data into intermediate data of the first stage within one clock period per point by multiplying each of the input data strings by a predetermined normalization coefficient. And
Each of the first stage intermediate data standardized by the standardization means is converted into a series of intermediate data over one or a plurality of stages within one clock period, or the intermediate data of the last stage Intermediate data conversion means for performing one or a plurality of conversion processes for converting each into output data within one clock cycle per point;
The controller is
Causing the normalization means and the intermediate data conversion means to repeatedly execute the single or plural normalization processes and the single or plural conversion processes until the output data of all points is calculated, and repeatedly Control so that at least two of the one or more normalization processes and the one or more conversion processes are executed in parallel within one clock cycle.
A wavelet transform device characterized by that.

請求項１記載のウェーブレット変換装置であって、前記規格化手段および前記中間データ変換手段は、前記規格化処理および前記変換処理を並列に実行する、ウェーブレット変換装置。2. The wavelet transform device according to claim 1, wherein the normalization unit and the intermediate data conversion unit execute the normalization process and the conversion process in parallel.

請求項１または請求項２記載のウェーブレット変換装置であって、
前記規格化手段は、
各入力データに前記規格化係数を乗算する規格化係数乗算器と、
前記規格化係数乗算器から出力されたデータを遅延させる遅延器と、
を含み、
前記中間データ変換手段は、
２点の中間データの一方に所定のリフティング係数を乗算するリフティング係数乗算器と、該リフティング係数乗算器から出力されたデータと前記２点の中間データの他方とを加算する加算器とからなる２点演算部と、
前記２点演算部から出力されたデータを取り込んで前記制御部から指定された出力先に出力する出力先選択部と、
を含み、
前記ウェーブレット変換装置は、さらに、
メモリ管理部と、
前記メモリ管理部の制御によりデータを一時記憶するメモリと、
を備え、
前記メモリ管理部は、
前記出力先選択部から出力された前記データを前記メモリに転送し記憶させるように制御する、
ことを特徴とするウェーブレット変換装置。The wavelet transform device according to claim 1 or 2,
The normalization means includes
A normalization coefficient multiplier that multiplies each input data by the normalization coefficient;
A delay unit for delaying data output from the normalization coefficient multiplier;
Including
The intermediate data conversion means includes
2 comprising a lifting coefficient multiplier that multiplies one of the two points of intermediate data by a predetermined lifting coefficient, and an adder that adds the data output from the lifting coefficient multiplier and the other of the two points of intermediate data. A point calculator,
An output destination selection unit that takes in the data output from the two-point calculation unit and outputs the data to the output destination specified by the control unit;
Including
The wavelet transform device further includes:
A memory management unit;
A memory for temporarily storing data under the control of the memory management unit;
With
The memory management unit
Control to transfer and store the data output from the output destination selection unit to the memory;
A wavelet transform device characterized by that.

請求項３記載のウェーブレット変換装置であって、
前記制御部は、前記変換処理として、
「前記第２データ列に属する入力データを起点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データに対して１点前の「前記第１データ列に属する入力データを起点とする系列」（以下、第１系列と呼ぶ。）上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の一時データを１点当たり１クロック周期内に算出する第１の変換処理と、
前記第１の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第２の変換処理と、
第１系列上の第１段階の中間データと、その中間データに対して１点前の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の一時データを１点当たり１クロック周期内に算出する第３の変換処理と、
前記第３の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の前記中間データを１点当たり１クロック周期内に算出する第４の変換処理と、
第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その中間データの系列に対して１点前の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する第５の変換処理と、
前記第５の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する第６の変換処理と、
第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その中間データの系列に対して１点前の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の一時データを１点当たり１クロック周期内に算出する第７の変換処理と、
前記第７の変換処理で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の前記中間データを１点当たり１クロック周期内に算出する第８の変換処理と、
を全ての点の前記出力データが算出されるまで前記２点演算部に繰り返し実行させるように制御する、ウェーブレット変換装置。The wavelet transform device according to claim 3,
The control unit, as the conversion process,
The intermediate data of the first stage on the “series starting from input data belonging to the second data string” (hereinafter referred to as the second series), and the “first By adding the data obtained by multiplying the intermediate data of the first stage on the “series starting from input data belonging to the data string” (hereinafter referred to as the first series) by a predetermined lifting coefficient, A first conversion process for calculating second-stage temporary data on the second series within one clock cycle per point;
Multiplying the temporary data calculated in the first conversion process and stored in the memory by a predetermined lifting coefficient to the intermediate data of the first stage on the first series one point after the temporary data series Second conversion processing for calculating the intermediate data of the second stage on the second series within one clock cycle per point by adding the obtained data,
Add the intermediate data of the first stage on the first series and the data obtained by multiplying the intermediate data of the second stage on the second series one point before the intermediate data by a predetermined lifting coefficient. Thus, a third conversion process for calculating the second stage temporary data on the first series within one clock cycle per point;
Multiplying the temporary data calculated by the third conversion process and stored in the memory and the intermediate data of the second stage on the second series one point after the temporary data series by a predetermined lifting coefficient A fourth conversion process for calculating the intermediate data of the second stage on the first series within one clock cycle per point by adding the data obtained in this way;
Predetermined lifting to intermediate data of the Mth stage (the number of stages M is an integer of 1 or more) on the second series and the intermediate data of the Mth stage on the first series one point before the series of the intermediate data A fifth conversion process for calculating the M + 1 stage temporary data on the second series within one clock cycle by adding the data obtained by multiplying the coefficients;
Multiplying the temporary data calculated in the fifth conversion process and stored in the memory by the predetermined lifting coefficient to the M-th stage intermediate data on the first series one point after the temporary data series A sixth conversion process for calculating the M + 1 stage intermediate data on the second series within one clock cycle per point by adding the obtained data to
Predetermined lifting to intermediate data of the Lth stage (stage number L is an integer of 1 or more) on the first series and the intermediate data of the (L + 1) th stage on the second series one point before the intermediate data series A seventh conversion process for calculating L + 1-stage temporary data on the first sequence within one clock cycle per point by adding the data obtained by multiplying the coefficients;
Multiplying the temporary data calculated in the seventh conversion process and stored in the memory by a predetermined lifting coefficient to the intermediate data of the L + 1 stage on the second series one point after the temporary data series An eighth conversion process for calculating the intermediate data of the (L + 1) -th stage on the first series within one clock cycle by adding the data obtained in the above-described manner;
Is controlled so that the two-point arithmetic unit repeatedly executes the output data until the output data of all points is calculated.

請求項４記載のウェーブレット変換装置であって、
前記制御部は、前記第１の変換処理および前記第３の変換処理を実行した後に、前記第５の変換処理および前記第７の変換処理を、前記最終段階の前記一時データが算出されるまで前記２点演算部に実行させ、その後、前記第２の変換処理および前記第４の変換処理を実行した後に、前記第６の変換処理および前記第８の変換処理を、前記出力データが算出されるまで前記２点演算部に実行させるように制御する、
ウェーブレット変換装置。The wavelet transform device according to claim 4,
The control unit performs the fifth conversion process and the seventh conversion process after the first conversion process and the third conversion process are performed until the temporary data in the final stage is calculated. The output data is calculated for the sixth conversion process and the eighth conversion process after the two-point calculation unit is executed and then the second conversion process and the fourth conversion process are executed. Until the two-point calculation unit is controlled until
Wavelet transform device.

請求項４記載のウェーブレット変換装置であって、
互いに独立に動作する４個の前記２点演算部を備え、
前記制御部は、前記変換処理として、
前記第２データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記第２の変換処理と、
Ｐ−１番目の入力データを始点とする系列上の第２段階の一時データを算出する前記第３の変換処理と、
Ｐ−４番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記第６の変換処理と、
Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の一時データを算出する前記第７の変換処理と、
の４工程を前記各２点演算部に並列に実行させると共に、
Ｐ＋２番目の入力データを始点とする系列上の第２段階の一時データを算出する前記第１の変換処理と、
前記Ｐ−１番目の入力データを始点とする系列上の第２段階の前記中間データを算出する前記第４の変換処理と、
Ｐ−２番目の入力データを始点とする系列上の第Ｍ段階の一時データを算出する前記第５の変換処理と、
前記Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の中間データを算出する前記第８の変換処理と、
の４個の処理をそれぞれ前記各２点演算部に並列に実行させるように制御する、ウェーブレット変換装置。The wavelet transform device according to claim 4,
Including the two two-point arithmetic units operating independently of each other;
The control unit, as the conversion process,
A second conversion process for calculating second-stage intermediate data on a series belonging to the second data string and starting from P-th input data (data number P is an integer) in the input data string; ,
The third conversion process for calculating second-stage temporary data on the series starting from the P-1th input data;
The sixth conversion processing for calculating the M + 1-th stage intermediate data on the series starting from the P-4th input data;
A seventh conversion process for calculating temporary data at the (L + 1) -th stage on the series starting from the P-5th input data;
The above four steps are executed in parallel by each of the two-point arithmetic units,
The first conversion processing for calculating second-stage temporary data on the sequence starting from P + 2nd input data;
The fourth conversion processing for calculating the intermediate data of the second stage on the series starting from the P-1th input data;
The fifth conversion process for calculating the M-th stage temporary data on the series starting from the P-2th input data;
The eighth conversion process for calculating intermediate data at the (L + 1) th stage on the series starting from the P-5th input data;
The wavelet transform apparatus controls each of the four processes to be executed in parallel by each of the two-point arithmetic units.

請求項１または請求項２記載のウェーブレット変換装置であって、
前記規格化手段は、
各入力データに前記規格化係数を乗算する規格化係数乗算器と、
前記規格化係数乗算器から出力されたデータを遅延させる遅延器と、
を含み、
前記中間データ変換手段は、
取り込まれた３点の入力データの中で第１および第２の入力データを加算する第１加算器と、該第１加算器から出力されたデータに所定のリフティング係数を乗算するリフティング係数乗算器と、該リフティング係数乗算器から出力されたデータと第３の入力データとを加算することで中間データを算出する第２加算器とからなる３点演算部と、
前記３点演算部から出力された中間データを取り込んで前記制御部から指定された出力先に出力する出力先選択部と、
を含み、
前記メモリ管理部は、
前記出力先選択部から出力された中間データを前記メモリに転送し記憶させるように制御する、
ウェーブレット変換装置。The wavelet transform device according to claim 1 or 2,
The normalization means includes
A normalization coefficient multiplier that multiplies each input data by the normalization coefficient;
A delay unit for delaying data output from the normalization coefficient multiplier;
Including
The intermediate data conversion means includes
A first adder for adding first and second input data among the fetched three points of input data, and a lifting coefficient multiplier for multiplying the data output from the first adder by a predetermined lifting coefficient A three-point arithmetic unit comprising: a second adder that calculates intermediate data by adding the data output from the lifting coefficient multiplier and the third input data;
An output destination selection unit that takes in the intermediate data output from the three-point calculation unit and outputs the intermediate data to the output destination specified by the control unit;
Including
The memory management unit
Control to transfer and store the intermediate data output from the output destination selection unit to the memory,
Wavelet transform device.

請求項７記載のウェーブレット変換装置であって、
前記制御部は、前記変換処理として、
「前記第２データ列に属する入力データを始点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データの系列に対して１点前後する「前記第１データ列に属する入力データを始点とする系列」（以下、第１系列と呼ぶ。）上の２点の第１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第１の変換処理と、
第１系列上の第１段階の中間データと、その第１段階の中間データの系列に対して１点前後する第２系列上の２点の第２段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する第２の変換処理と、
第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その第Ｍ段階の中間データの系列に対して１点前後する第１系列上の２点の第Ｍ段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する第３の変換処理と、
第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その第Ｌ段階の中間データの系列に対して１点前後する第２系列上の２点の第Ｌ＋１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する第４の変換処理と、
を全ての点の前記出力データが算出されるまで前記３点演算部に繰り返し実行させるように制御する、ウェーブレット変換装置。The wavelet transform device according to claim 7,
The control unit, as the conversion process,
The intermediate data of the first stage on the “series starting from input data belonging to the second data string” (hereinafter referred to as the second series), and about one point relative to the series of the intermediate data Obtained by multiplying data obtained by adding the intermediate data of the first stage of two points on the “sequence starting from input data belonging to the first data string” (hereinafter referred to as the first sequence) by a predetermined lifting coefficient. A first conversion process for calculating the intermediate data of the second stage on the second series within one clock cycle per point by adding the data;
A predetermined value is added to the data obtained by adding the intermediate data of the first stage on the first series and the intermediate data of the second stage of two points on the second series that are around one point to the series of the intermediate data of the first stage. A second conversion process for calculating the intermediate data of the second stage on the first sequence within one clock cycle per point by adding the data obtained by multiplying the lifting coefficients;
Intermediate data of the Mth stage (the number of stages M is an integer of 1 or more) on the second series, and two Mth stages on the first series that are around one point with respect to the series of the intermediate data of the Mth stage The intermediate data of the (M + 1) th stage on the second series is calculated within one clock cycle per point by adding the data obtained by multiplying the intermediate data to the data obtained by multiplying the predetermined lifting coefficient. 3 conversion processing,
Intermediate data of the Lth stage (the number of stages L is an integer of 1 or more) on the first series, and two (L + 1) th stages on the second series that are around one point with respect to the intermediate data series of the Lth stage The intermediate data of the (L + 1) -th stage on the first series is calculated within one clock cycle per point by adding the data obtained by multiplying the intermediate data to the data obtained by multiplying the predetermined lifting coefficient. 4 conversion processing,
Is controlled so that the three-point arithmetic unit repeatedly executes the output data until the output data of all points is calculated.

請求項８記載のウェーブレット変換装置であって、
互いに独立に動作する２個の前記３点演算部を備え、
前記制御部は、
前記第１データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の前記中間データを算出する前記第２の変換処理と、
Ｐ−４番目の入力データを始点とする系列上の第Ｌ＋１段階の前記中間データを算出する前記第４の変換処理と、
の２個の処理をそれぞれ前記各３点演算部に並列に実行させるように制御する、ウェーブレット変換装置。The wavelet transform device according to claim 8,
Including the two three-point arithmetic units operating independently of each other;
The controller is
The second conversion processing for calculating the intermediate data on a series belonging to the first data string and starting from P-th input data (data number P is an integer) in the input data string;
The fourth conversion processing for calculating the intermediate data of the (L + 1) -th stage on the series starting from the P-4th input data;
A wavelet transform device that controls each of the three points to be executed in parallel by each of the three-point arithmetic units.

請求項８または請求項９記載のウェーブレット変換装置であって、
前記制御部は、
前記入力データ列の中でＰ＋３番目（データ番号Ｐは整数）の入力データを始点とする系列上の前記中間データを算出する前記第１の変換処理と、
Ｐ−１番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記第３の変換処理と、
の２個の処理をそれぞれ前記各３点演算部に並列に実行させるように制御する、ウェーブレット変換装置。The wavelet transform device according to claim 8 or 9, wherein
The controller is
The first conversion processing for calculating the intermediate data on a series starting from P + 3th input data (data number P is an integer) in the input data sequence;
The third conversion processing for calculating intermediate data of the (M + 1) th stage on the series starting from the (P-1) th input data;
A wavelet transform device that controls each of the three points to be executed in parallel by each of the three-point arithmetic units.

請求項８記載のウェーブレット変換装置であって、前記制御部は、前記第１の変換処理〜前記第４の変換処理を並列にさせるように制御する、ウェーブレット変換装置。9. The wavelet transform device according to claim 8, wherein the control unit controls the first conversion process to the fourth conversion process to be performed in parallel.

請求項１〜請求項１１の何れか１項に記載のウェーブレット変換装置であって、
前記フィルタリング部は、直列に接続される第１フィルタリング部と第２フィルタリング部とから構成されており、
前記第１フィルタリング部は、水平方向および垂直方向のうちの一方向に帯域分割されている前記高域成分および前記低域成分のデータを入力し、これらのデータを合成してライン単位で算出し、
前記第２フィルタリング部は、前記第１フィルタリング部で算出された合成データに対して処理を実行することで、前記水平方向および前記垂直方向のうちの他方向の合成データを算出する、
ウェーブレット変換装置。The wavelet transform device according to any one of claims 1 to 11,
The filtering unit is composed of a first filtering unit and a second filtering unit connected in series,
The first filtering unit inputs the data of the high-frequency component and the low-frequency component that are band-divided in one of the horizontal direction and the vertical direction, and synthesizes these data to calculate in line units. ,
The second filtering unit calculates combined data in another direction of the horizontal direction and the vertical direction by executing processing on the combined data calculated by the first filtering unit.
Wavelet transform device.

リフティング構成に基づいて、帯域分割された高域成分のデータと低域成分のデータとを合成するウェーブレット変換方法であって、
（ａ）高域成分および低域成分の一方からなる第１データ列と、その他方からなる第２データ列とが画素単位で交互に配列されて構成される入力データ列から、入力データを選択的に取り込む工程と、
（ｂ）前記工程（ａ）で取り込まれた前記入力データの各々に規格化係数を乗算することで第１段階の中間データへ１点当たり１クロック周期内に変換する工程と、
（ｃ）第ｍ段階（ｍは１以上の整数）の中間データを第ｍ＋１段階の中間データへ１点当たり１クロック周期内に算出する工程（第ｍ段階の中間データが最終段階の中間データである場合を含む。この場合、第ｍ＋１段階の中間データは出力データである。）と、
を備え、
前記工程（ｂ）および工程（ｃ）を、全ての点の前記出力データが算出されるまで繰り返し実行し、且つ、繰り返し実行される前記工程（ｂ）および工程（ｃ）を１クロック周期内に並列に実行することを特徴とするウェーブレット変換方法。A wavelet transform method for synthesizing band-divided high-frequency component data and low-frequency component data based on a lifting configuration,
(A) Input data is selected from an input data string formed by alternately arranging a first data string composed of one of a high frequency component and a low frequency component and a second data string composed of the other one in pixel units. The process of capturing automatically,
(B) converting each of the input data captured in the step (a) by a normalization coefficient to convert it into intermediate data of the first stage within one clock period per point;
(C) A step of calculating intermediate data of the m-th stage (m is an integer of 1 or more) to the intermediate data of the (m + 1) -th stage within one clock cycle per point (the intermediate data of the m-th stage is the intermediate data of the final stage) In this case, the intermediate data in the (m + 1) th stage is output data.)
With
The step (b) and the step (c) are repeatedly executed until the output data of all points are calculated, and the step (b) and the step (c) that are repeatedly executed are performed within one clock cycle. A wavelet transform method characterized by being executed in parallel.

請求項１３に記載のウェーブレット変換方法であって、
前記工程（ｃ）は、
（ｃ−１）「前記第２データ列に属する入力データを起点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データに対して１点前の「前記第１データ列に属する入力データを起点とする系列」（以下、第１系列と呼ぶ。）上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の一時データを１点当たり１クロック周期内に算出する工程と、
（ｃ−２）前記工程（ｃ−１）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、
（ｃ−３）第１系列上の第１段階の中間データと、その中間データに対して１点前の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の一時データを１点当たり１クロック周期内に算出する工程と、
（ｃ−４）前記工程（ｃ−３）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第２段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、
（ｃ−５）第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その中間データの系列に対して１点前の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、
（ｃ−６）前記工程（ｃ−５）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第１系列上の第Ｍ段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、
（ｃ−７）第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の前記中間データと、その中間データの系列に対して１点前の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の一時データを１点当たり１クロック周期内に算出する工程と、
（ｃ−８）前記工程（ｃ−７）で算出され前記メモリに記憶された前記一時データと、その一時データの系列に対して１点後の第２系列上の第Ｌ＋１段階の中間データに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、
を備え、
前記工程（ｃ−１）〜工程（ｃ−８）を、全ての点の出力データが算出されるまで繰り返し実行させるように制御する、ウェーブレット変換方法。The wavelet transform method according to claim 13,
The step (c)
(C-1) Intermediate data of the first stage on the “series starting from input data belonging to the second data string” (hereinafter referred to as the second series), and one point before the intermediate data The intermediate data of the first stage on the “series starting from input data belonging to the first data string” (hereinafter referred to as the first series) is added to the data obtained by multiplying by a predetermined lifting coefficient A step of calculating the second stage temporary data on the second series within one clock cycle per point;
(C-2) The temporary data calculated in step (c-1) and stored in the memory, and the intermediate data of the first stage on the first series one point after the series of the temporary data Calculating the intermediate data of the second stage on the second sequence within one clock cycle per point by adding the data obtained by multiplying by a predetermined lifting coefficient;
(C-3) Obtained by multiplying the intermediate data of the first stage on the first series and the intermediate data of the second stage on the second series one point before the intermediate data by a predetermined lifting coefficient Calculating the second stage temporary data on the first sequence within one clock cycle per point by adding the data;
(C-4) The temporary data calculated in the step (c-3) and stored in the memory, and intermediate data of the second stage on the second series one point after the temporary data series Calculating the intermediate data of the second stage on the first sequence within one clock cycle per point by adding the data obtained by multiplying by a predetermined lifting coefficient;
(C-5) Intermediate data of the Mth stage (number of stages M is an integer of 1 or more) on the second series, and the middle of the Mth stage on the first series one point before the intermediate data series Calculating the M + 1 stage temporary data on the second sequence within one clock cycle per point by adding the data obtained by multiplying the data by a predetermined lifting coefficient;
(C-6) The temporary data calculated in step (c-5) and stored in the memory, and the intermediate data of the Mth stage on the first series one point after the temporary data series Calculating the M + 1 stage temporary data on the second sequence within one clock cycle per point by adding the data obtained by multiplying a predetermined lifting coefficient;
(C-7) The intermediate data at the L-th stage (stage number L is an integer of 1 or more) on the first series, and the (L + 1) -th stage on the second series one point before the intermediate data series A step of calculating the L + 1 stage temporary data on the first sequence within one clock cycle per point by adding the intermediate data and the data obtained by multiplying by a predetermined lifting coefficient;
(C-8) The temporary data calculated in step (c-7) and stored in the memory, and the intermediate data of the (L + 1) -th stage on the second series one point after the temporary data series A step of calculating L + 1 stage intermediate data on the first sequence within one clock cycle per point by adding data obtained by multiplying a predetermined lifting coefficient;
With
A wavelet transform method for performing control so that the steps (c-1) to (c-8) are repeatedly executed until output data of all points are calculated.

請求項１４記載のウェーブレット変換方法であって、
前記工程（ｃ−１）および前記工程（ｃ−３）を実行した後に、前記工程（ｃ−５）および前記工程（ｃ−７）を、前記出力データの一時データが算出されるまで実行し、
その後、前記工程（ｃ−２）および前記工程（ｃ−４）を実行した後に、前記工程（ｃ−６）および前記工程（ｃ−８）を、前記出力データが算出されるまで実行する、
ウェーブレット変換方法。The wavelet transform method according to claim 14,
After executing the step (c-1) and the step (c-3), the step (c-5) and the step (c-7) are executed until the temporary data of the output data is calculated. ,
Then, after executing the step (c-2) and the step (c-4), the step (c-6) and the step (c-8) are executed until the output data is calculated.
Wavelet transform method.

請求項１４記載のウェーブレット変換方法であって、
前記第２データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記工程（ｃ−２）と、
Ｐ−１番目の入力データを始点とする系列上の第２段階の一時データを算出する前記工程（ｃ−３）と、
Ｐ−４番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記工程（ｃ−６）と、
Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の一時データを算出する前記工程（ｃ−７）と、
の４工程を前記各２点演算部に並列に実行させると共に、
Ｐ＋２番目の入力データを始点とする系列上の第２段階の一時データを算出する前記工程（ｃ−１）と、
前記Ｐ−１番目の入力データを始点とする系列上の第２段階の前記中間データを算出する前記工程（ｃ−４）と、
Ｐ−２番目の入力データを始点とする系列上の第Ｍ＋１段階の一時データを算出する前記工程（ｃ−５）と、
前記Ｐ−５番目の入力データを始点とする系列上の第Ｌ＋１段階の中間データを算出する前記工程（ｃ−８）と、
の４個の処理をそれぞれ並列に実行させるように制御する、
ウェーブレット変換方法。The wavelet transform method according to claim 14,
The step (c-2) of calculating second-stage intermediate data on a series belonging to the second data string and starting from P-th input data (data number P is an integer) in the input data string When,
The step (c-3) of calculating the second stage temporary data on the series starting from the P-1th input data;
The step (c-6) of calculating intermediate data of the (M + 1) -th stage on the series starting from the P-4th input data;
The step (c-7) of calculating L + 1 stage temporary data on the series starting from the P-5th input data;
The above four steps are executed in parallel by each of the two-point arithmetic units,
The step (c-1) of calculating the second stage temporary data on the series starting from the P + 2nd input data;
The step (c-4) of calculating the intermediate data of the second stage on the series starting from the P-1st input data;
The step (c-5) of calculating temporary data of the (M + 1) -th stage on the series starting from the P-2th input data;
The step (c-8) of calculating intermediate data of the (L + 1) -th stage on the series starting from the P-5th input data;
To control each of the four processes in parallel,
Wavelet transform method.

請求項１３に記載のウェーブレット変換方法であって、
前記工程（ｃ）は、
（ｃ−１）「前記第２データ列に属する入力データを始点とする系列」（以下、第２系列と呼ぶ。）上の第１段階の中間データと、その中間データの系列に対して１点前後する「前記第１データ列に属する入力データを始点とする系列」（以下、第１系列と呼ぶ。）上の２点の第１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、
（ｃ−２）第１系列上の第１段階の中間データと、その中間データの系列に対して１点前後する第２系列上の２点の第２段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第２段階の中間データを１点当たり１クロック周期内に算出する工程と、
（ｃ−３）第２系列上の第Ｍ段階（段階数Ｍは１以上の整数）の中間データと、その第Ｍ段階の中間データの系列に対して１点前後する第１系列上の２点の第Ｍ段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第２系列上の第Ｍ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、
（ｃ−４）第１系列上の第Ｌ段階（段階数Ｌは１以上の整数）の中間データと、その第Ｌ段階の中間データの系列に対して１点前後する第２系列上の２点の第Ｌ＋１段階の中間データを加算したデータに所定のリフティング係数を乗算して得たデータとを加算することで、当該第１系列上の第Ｌ＋１段階の中間データを１点当たり１クロック周期内に算出する工程と、
を備え、
前記工程（ｃ−１）〜工程（ｃ−４）を、全ての点の前記出力データが算出されるまで繰り返し実行するウェーブレット変換方法。The wavelet transform method according to claim 13,
The step (c)
(C-1) The intermediate data in the first stage on the “series starting from input data belonging to the second data string” (hereinafter referred to as the second series) and 1 for the intermediate data series A predetermined lifting coefficient is added to the data obtained by adding the first stage intermediate data of two points on the “series starting from input data belonging to the first data string” (hereinafter referred to as the first series). A step of calculating intermediate data of the second stage on the second series within one clock cycle per point by adding the data obtained by multiplication;
(C-2) Predetermined to data obtained by adding the intermediate data of the first stage on the first series and the second stage of intermediate data of two points on the second series that are around one point to the series of the intermediate data Calculating intermediate data of the second stage on the first sequence within one clock cycle per point by adding the data obtained by multiplying the lifting coefficients of
(C-3) Intermediate data of the Mth stage (the number M of stages is an integer of 1 or more) on the second series, and 2 on the first series that are around one point with respect to the series of intermediate data of the Mth stage By adding the data obtained by multiplying the M-th stage intermediate data of the points to the data obtained by multiplying the predetermined lifting coefficient, the M + 1-th stage intermediate data on the second series is converted into one clock cycle per point. A step of calculating within,
(C-4) Intermediate data at the Lth stage (the number of stages L is an integer of 1 or more) on the first series and 2 on the second series around one point with respect to the series of intermediate data at the Lth stage By adding the data obtained by multiplying the L + 1 stage intermediate data of the point by the predetermined lifting coefficient, the L + 1 stage intermediate data on the first series is added to one clock cycle per point. A step of calculating within,
With
A wavelet transform method that repeatedly executes the steps (c-1) to (c-4) until the output data of all points is calculated.

請求項１７記載のウェーブレット変換方法であって、
前記第１データ列に属し且つ前記入力データ列の中でＰ番目（データ番号Ｐは整数）の入力データを始点とする系列上の第２段階の中間データを算出する前記工程（ｃ−２）と、
Ｐ−４番目の入力データを始点とする系列上の第Ｌ＋１段階の前記中間データを算出する前記工程（ｃ−４）と、
の２個の処理をそれぞれ並列に実行させるように制御する、
ウェーブレット変換方法。The wavelet transform method according to claim 17,
The step (c-2) of calculating second stage intermediate data on a series belonging to the first data string and starting from P-th input data (data number P is an integer) in the input data string When,
The step (c-4) of calculating the intermediate data of the (L + 1) -th stage on the series starting from the P-4th input data;
To control each of the two processes in parallel,
Wavelet transform method.

請求項１７または請求項１８記載のウェーブレット変換方法であって、
前記入力データ列の中でＰ＋３番目（データ番号Ｐは整数）の入力データを始点とする系列上の中間データを算出する前記工程（ｃ−１）と、
Ｐ−１番目の入力データを始点とする系列上の第Ｍ＋１段階の中間データを算出する前記工程（ｃ−３）と、
の２個の処理をそれぞれ並列に実行させるように制御する、
ウェーブレット変換方法。The wavelet transform method according to claim 17 or 18,
The step (c-1) of calculating intermediate data on a series starting from the input data of P + 3 (data number P is an integer) in the input data sequence;
The step (c-3) of calculating intermediate data of the (M + 1) -th stage on the series starting from the (P-1) th input data;
To control each of the two processes in parallel,
Wavelet transform method.

請求項１７記載のウェーブレット変換方法であって、前記工程（ｃ−１）〜工程（ｃ−４）を並列に実行する、ウェーブレット変換方法。The wavelet transform method according to claim 17, wherein the steps (c-1) to (c-4) are executed in parallel.

請求項１３〜請求項２０の何れか１項に記載のウェーブレット変換方法であって、
低域成分と高域成分に帯域分割された２次元画像データに対して、当該２次元画像データの水平方向および垂直方向のうちの一方向にライン単位で前記工程（ａ）〜工程（ｃ）を適用することによって合成データ列を算出し、この算出された合成データ列に対して、前記水平方向および前記垂直方向のうちの他方向に前記工程（ａ）〜工程（ｃ）を適用する、ウェーブレット変換方法。The wavelet transform method according to any one of claims 13 to 20, comprising:
For the two-dimensional image data band-divided into a low-frequency component and a high-frequency component, the steps (a) to (c) are performed in units of lines in one of the horizontal and vertical directions of the two-dimensional image data. And applying the steps (a) to (c) to the calculated combined data sequence in the other direction of the horizontal direction and the vertical direction. Wavelet transform method.