JP4372371B2

JP4372371B2 - Thinning device and enlargement device in SIMD type microprocessor

Info

Publication number: JP4372371B2
Application number: JP2001103319A
Authority: JP
Inventors: 慎一山浦
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-04-02
Filing date: 2001-04-02
Publication date: 2009-11-25
Anticipated expiration: 2021-04-02
Also published as: JP2002298135A

Description

【０００１】
【発明の属する技術分野】
本発明は、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａ−ｓｔｒｅａｍ；単一命令多データ処理）型マイクロプロセッサに関する。
【０００２】
【従来の技術】
ＳＩＭＤ型マイクロプロセッサでは、複数のデータに対して１つの命令で同時に同一の演算処理が実行可能である。この構造により、演算は同一であるがデータ量が非常に多い処理（例えば、画像処理）に係る用途において、頻用される。
【０００３】
ＳＩＭＤ型マイクロプロセッサにおける通常の演算処理では、複数の演算ユニット（ＰｒｏｃｅｓｓｏｒＥｌｅｍｅｎｔ〔ＰＥ〕；プロセッサエレメント）を並べ同一の演算を同時に複数のデータに対して実行する。
【０００４】
ＳＩＭＤ型マイクロプロセッサは、全てのＰＥが同時に動作する処理においてはその性能を十分に発揮できる。
【０００５】
ところで、画像データの（演算）処理においては、処理対象の画像データの領域が、例えば、「文字領域」であるのか又は「写真領域」であるのか、を検出し、その検出結果を利用して領域別に処理内容を変更することがある。このような検出の場合、判定対象データとして相当に広い領域範囲の画素データの処理が必要とされる。ところが、ＳＩＭＤ型マイクロプロセッサにおいては、画素データを直接操作しようとすると、一度に処理できるのは精々「８×８」画素程度の範囲のデータに過ぎない。そこで、より広い範囲の画素データを処理するために、例えば「４×４」画素データを１つのデータ（ブロック）に変換し（ブロック化と言う。）、このブロックに対して領域判定処理を行なうことがある。「４×４」画素を１ブロックとしそのブロックを「８×８」個集めてその範囲で処理を行うと、結局「３２×３２」画素の範囲のデータが処理対象となる。図２２にその様子を示す。従って、相当に広い範囲の画素データが一度に扱えることになる。
【０００６】
上記のブロック化処理では、縮小処理が必須となる。
【０００７】
しかし、ＳＩＭＤ型マイクロプロセッサでは通常この縮小処理を実施することは困難である。従来のＳＩＭＤ型マイクロプロセッサにおいては、そのため種々の工夫がなされている。
【０００８】
例えば、従来技術においては、ＰＥに内蔵された出力レジスタに対しデータを出力するかどうかの指示を与えることにより外部に出力するデータの選択を行い、よって縮小を実現しているものがある。
【０００９】
特開平８−１２３６８３号では、プロセッサから外部へ出力する出力シフトレジスタのシフト動作を制御して飛び飛びの画素データを出力することで、縮小処理を実現している。
【００１０】
けれども、上記技術の場合には、出力データを縮小することにのみ限定され、演算の途中のデータを縮小することはできない。然も、各ＰＥにて、データを出力するか否かを制御する回路を付加することが必要である（コスト高の問題を生じる）。さらに、縮小したデータを拡大しようとしても、上記の技術では全く対応できない。
【００１１】
また、例えば、別の従来技術においては、ＰＥに内蔵されたレジスタのデータを外部に出力し外部のＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）メモリにデータを保存する際に、任意のデータを間引いて該ＦＩＦＯメモリに格納し、その後ＦＩＦＯメモリから各ＰＥのレジスタにデータを転送して、縮小を実現している。この場合にも、１ライン分のＦＩＦＯメモリが付加的に必要となる（コスト高の問題がある）。
【００１２】
類似の従来技術が、特開平９−２１２６３７号で開示されている。そこでは、パラレル／シリアル変換器とシリアル／パラレル変換器をプロセッサエレメントに持たせ、よって縮小・拡大処理を実現している。この従来技術においても、上記のハードウエアを搭載することが必要であるため、縮小動作が不要な場合に不要なコストが発生する問題がある。
【００１３】
【発明が解決しようとする課題】
本発明は、ＳＩＭＤ型マイクロプロセッサにおいて、簡易な構成及び低コストにより、間引き処理、即ち縮小処理、及び拡大処理を行なうことを目的とする。
【００１４】
【課題を解決するための手段】
本発明は、上記の目的を達成するためになされたものである。本発明に係る請求項１に記載のデータ間引き装置は、
複数のプロセッサエレメントを含むＳＩＭＤ型マイクロプロセッサが有する、
プロセッサエレメントが内蔵している汎用レジスタに上記マイクロプロセッサ外部からアクセスするための、データ転送用ポートに接続されているデータ間引き装置であり、
内部に含まれるシーケンサにより制御され、
上記ＳＩＭＤ型マイクロプロセッサに含まれるグローバルプロセッサより行われる動作開始の指示設定に応じて自動的に動作することで上記ＳＩＭＤ型マイクロプロセッサのＳＩＭＤ演算処理と並列して処理を実施し、
前記プロセッサエレメントが内蔵している汎用レジスタに格納されるデータの中から任意のデータを複数選択して読み出し、その後それら複数のデータを汎用レジスタに書き戻し、
書き戻しによりそれら複数のデータが格納されるプロセッサエレメントの間隔が、読み出し時にそれら複数のデータが格納されていたプロセッサエレメントの間隔よりも、小さいことを特徴とする。
【００１５】
本発明に係る請求項２に記載のデータ拡大装置は、
複数のプロセッサエレメントを含むＳＩＭＤ型マイクロプロセッサが有する、
プロセッサエレメントが内蔵している汎用レジスタに上記マイクロプロセッサ外部からアクセスするための、データ転送用ポートに接続されているデータ拡大装置であり、
内部に含まれるシーケンサにより制御され、
上記ＳＩＭＤ型マイクロプロセッサに含まれるグローバルプロセッサより行われる動作開始の指示設定に応じて自動的に動作することで上記ＳＩＭＤ型マイクロプロセッサのＳＩＭＤ演算処理と並列して処理を実施し、
前記プロセッサエレメントが内蔵している汎用レジスタに格納されるデータの中から任意のデータを複数選択して読み出し、その後それら複数のデータを汎用レジスタに書き戻し、
書き戻しによりそれら複数のデータが格納されるプロセッサエレメントの間隔が、読み出し時にそれら複数のデータが格納されていたプロセッサエレメントの間隔よりも、大きいことを特徴とする。
【００１６】
本発明に係る請求項３に記載のデータ間引き装置は、
読み出されたデータにおいて、所定の位置の１ビットに格納されるデータ、若しくは所定の位置の複数ビットに格納されるデータを、選択し、
汎用レジスタに書き戻す際には、１つの汎用レジスタ上に、複数組の、上記データを格納させる、
請求項１に記載の、データ間引き装置である。
【００１７】
本発明に係る請求項４に記載のデータ間引き装置は、
選択される任意のデータ以外のデータであって、プロセッサエレメント全体により構成される列の第１の端部近傍に位置するプロセッサエレメントが内蔵する汎用レジスタに格納される所定のデータを、読み出し、
それらデータを、プロセッサエレメント全体により構成される列の第２の端部近傍に位置するプロセッサエレメントが内蔵している汎用レジスタに、書き戻し、
同時に、
選択される任意のデータ以外のデータであって、プロセッサエレメント全体により構成される列の、第２の端部近傍に位置するプロセッサエレメントが内蔵する汎用レジスタに格納される所定のデータを、読み出し、
それらデータを、プロセッサエレメント全体により構成される列の第１の端部近傍に位置するプロセッサエレメントが内蔵している汎用レジスタに、書き戻す、
請求項１に記載の、データ間引き装置である。
【００１８】
本発明に係る請求項５に記載のデータ拡大装置は、
読み出されたデータを、１ビットのデータ若しくは複数ビットのデータに、分割し、
上記分割により形成される複数組の上記データを、別のプロセッサエレメントに内蔵される汎用レジスタに書き戻す、
請求項２に記載の、データ拡大装置である。
【００１９】
本発明に係る請求項６に記載のデータ拡大装置は、
書き戻しによりデータが格納される第１のプロセッサエレメントと、書き戻しによりデータが格納される第２のプロセッサエレメントとにおいて、
上記第１のプロセッサエレメントと上記第２のプロセッサエレメントとの間に、１つの又は複数の連続する第３のプロセッサエレメントが存在し、それら第３のプロセッサエレメントには、データ書き戻しは行なわれないとき、
第１のプロセッサエレメントに書き戻されるデータ、又は第２のプロセッサエレメントに書き戻されるデータが、
第３のプロセッサエレメントに内蔵される汎用レジスタに、複写して書かれる、
請求項２に記載の、データ拡大装置である。
【００２０】
本発明に係る請求項７に記載のデータ間引き装置は、
縮小対象のデータを構成する最小構成単位が縮小対象のデータにいくつ含まれるかを示す個数分と等しい、ビット数までを、少なくともビット容量として備えるメモリ部を含み、
縮小対象のデータを構成する個々の最小構成単位にて間引き対象であるものと、上記メモリ部のビット位置とを対応させ、対応するビット位置に所定のデータを格納し、
その格納された所定のデータに基づいて、
プロセッサエレメントが内蔵している汎用レジスタに格納されるデータからの上記読み出し処理と、その後の汎用レジスタへの上記書き戻し処理とを行う
請求項１に記載の、データ間引き装置である。
【００２１】
本発明に係る請求項８に記載のデータ拡大装置は、
拡大処理後のデータを構成する最小構成単位が拡大処理後のデータにいくつ含まれるかを示す個数分と等しい、ビット数までを、少なくともビット容量として備えるメモリ部を含み、
拡大処理後のデータを構成する個々の最小構成単位にて拡大制御対象であるものと、上記メモリ部のビット位置とを対応させ、対応するビット位置に所定のデータを格納し、
その格納された所定のデータ基づいて、
プロセッサエレメントが内蔵している汎用レジスタに格納されるデータからの上記読み出し処理と、その後の汎用レジスタへの上記書き戻し処理とを行う
請求項２に記載の、データ拡大装置である。
【００２２】
【発明の実施の形態】
以下、添付の図面を参照しつつ、本発明に係る好適な実施の形態を説明する。
【００２３】
図１及び図２１は、本発明に係るＳＩＭＤ型マイクロプロセッサ２の概略の構成を示すブロック図である。図１及び図２１の構成は、後で説明する第１の実施の形態乃至第８の実施の形態のＳＩＭＤ型マイクロプロセッサ２の、基礎となる構成である。
【００２４】
図１のＳＩＭＤ型マイクロプロセッサ２は、概略、グローバルプロセッサ４、レジスタファイル６、演算アレイ８、及び間引き器１０若しくは拡大器１２から構成される。
【００２５】
（１）グローバルプロセッサ４
このグローバルプロセッサ４そのものは、いわゆるＳＩＳＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｔｒｅａｍ，ＳｉｎｇｌｅＤａｔａＳｔｒｅａｍ）型のプロセッサであり、プログラムＲＡＭとデータＲＡＭを内蔵し、プログラムを解読し各種制御信号を生成する。この制御信号は内蔵する各種ブロックの以外に、レジスタファイル６、演算アレイ８に供給される。また、ＧＰ（グローバルプロセッサ）命令実行時は内蔵する汎用レジスタ、ＡＬＵ（算術論理演算器）等を使用して各種演算処理、プログラム制御処理を行う。
【００２６】
（２）レジスタファイル６
ＰＥ（プロセッサエレメント）命令で処理されるデータを保持している。ＰＥ（プロセッサエレメント）３は、公知のように、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−Ｓｔｒｅａｍ，ＭｕｌｔｉｐｌｅＤａｔａ−Ｓｔｒｅａｍ）型プロセッサにおいて個別の演算を実行する構成単位である。図２１のレジスタファイル６及び演算アレイ８が示すように、図２１のＳＩＭＤ型マイクロプロセッサ２では２５６個のＰＥ３を含んでいる。上記のＰＥ命令はＳＩＭＤ型の命令であり、レジスタファイル６に保持されている複数のデータに対し、同時に同じ処理を行なう。このレジスタファイル６からのデータの読み出し／書き込みの制御はグローバルプロセッサ４からの制御信号によって行なわれる。読み出されたデータは演算アレイ８（演算部）に送られ、演算アレイ８（演算部）での演算処理後にレジスタファイルに書き込まれる。
【００２７】
図２１のレジスタファイル６においては、１つのＰＥ単位に８ビットのレジスタ３４が３２本内蔵されており、２５６個のＰＥ分の（３２本の）組がアレイ構成になっている。レジスタ３４はＰＥ毎に、Ｒ０、Ｒ１、Ｒ２、・・・Ｒ３１と呼ばれる。
【００２８】
また、レジスタファイル６はプロセッサ２外部からのアクセスが可能であり、グローバルプロセッサ４の制御とは別に、外部から特定のレジスタに対し読み出し／書き込みが行なわれる。
【００２９】
（３）演算アレイ
ＰＥ命令の演算処理が行なわれる。処理の制御はすべてグローバルプロセッサ４から行なわれる。
【００３０】
レジスタファイル６と演算アレイ８との接続部位に、７ｔｏ１（７対１）のマルチプレクサ４２が置かれている。図２１及び図４に示すように、あるマルチプレクサ４２から見て、左方向の３つのＰＥ３に含まれるＲ０〜Ｒ３１レジスタ３４のデータと、右方向の３つのＰＥ３に含まれるＲ０〜Ｒ３１レジスタ３４のデータと、自らが属するＰＥ３に含まれるＲ０〜Ｒ３１レジスタ３４のデータを、演算対象として選択し得るように設定されている。
【００３１】
各ＰＥ３には、ＰＥ番号と呼ばれる通し番号が付されている。図２１のＳＩＭＤ型マイクロプロセッサ２では、ＰＥの個数が２５６個であるので、８ビットのビット列（即ち、００００００００ｂ〜１１１１１１１１ｂの２５６通り。ここで、上記のような末尾の“ｂ”は２進法表記であることを表す。）が、各ＰＥ３にＰＥ番号データとして与えられる。ＰＥ番号は、各ＰＥ３に対し、ＰＥの位置とは無関係に与えられても構わないが、本明細書においては、（右）端から順に付されているものとする。また、ＰＥ番号が「ｎ」であるＰＥを、
・ＰＥ［ｎ］
と表すことにする。従って、図２１のＳＩＭＤ型マイクロプロセッサ２は、右方から、ＰＥ［０］、ＰＥ［２］、ＰＥ［３］、・・・ＰＥ［２５４］、ＰＥ［２５５］により、構成される。
【００３２】
（４）間引き器１０、又は拡大器１２
レジスタファイル６から読み出されたデータから、任意のデータを選択してレジスタファイル６に書き込む。書き込む場合には、通常、下位番号のＰＥ３から詰めて書き込まれる。例えば、２５６個のＰＥ３において、１／４の間引きの処理が行なわれる場合、ＰＥ［０］、ＰＥ［４］、ＰＥ［８］、ＰＥ［１２］、ＰＥ［１６］・・・における特定のレジスタ３４のデータが読み出され、それらデータは、ＰＥ［０］、ＰＥ［１］、ＰＥ［２］、ＰＥ［３］、ＰＥ［４］・・に書き込まれる。「間引きの処理」については、後で詳しく説明する。
【００３３】
≪第１の実施の形態≫
図２は、本発明の第１の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２のブロック図である。
【００３４】
図２において、間引き器１０は、プロセッサ２のレジスタファイル６の外部インタフェース１４に接続される。このレジスタファイル６の外部インタフェース１４では、アドレス、ＣＬＫ、Ｒ／Ｗの信号がプロセッサ２外部より与えられると、アドレスで示されたＰＥ３のレジスタ３４にアクセスでき、ＲｅａｄＤａｔａバス１６によりレジスタ３４の内容が読み出され、ＷｒｉｔｅＤａｔａバス１８によりレジスタ３４にデータが書き込まれる。
【００３５】
間引き器１０は、アクセス対象のＰＥのアドレスを生成するアドレス生成回路２０、読み出されたデータを書き込みまでの期間保持するデータバッファ２２、それらを制御するシーケンサ２４により構成され、プロセッサ２のグローバルプロセッサ（ＧＰ）４より動作の設定が行なわれる。
【００３６】
上記の構成において、２５６個のプロセッサエレメント３を有するＳＩＭＤ型マイクロプロセッサ２での、１／４の間引き動作は以下のようになる。
【００３７】
（１）ＰＥ［０］（アドレス０）のレジスタのデータを読み出し、データバッファ２２に格納する。
（２）ＰＥ［０］（アドレス０）のレジスタへ、データバッファ２２のデータを書き込む。
（３）ＰＥ［４］（アドレス４）のレジスタのデータを読み出し、データバッファ２２に格納する。
（４）ＰＥ［１］（アドレス１）のレジスタへ、データバッファ２２のデータを書き込む。
（５）ＰＥ［８］（アドレス８）のレジスタのデータを読み出し、データバッファ２２に格納する。
（６）ＰＥ［２］（アドレス２）のレジスタへ、データバッファ２２のデータを書き込む。
（７）ＰＥ［１２］（アドレス１２）のレジスタのデータを読み出し、データバッファ２２に格納する。
（８）ＰＥ［３］（アドレス３）のレジスタへ、データバッファ２２のデータを書き込む。
・・・・（同様の動作が繰り返されるため、途中略する。）
（１２３）ＰＥ［２４４］（アドレス２４４）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２４）ＰＥ［６１］（アドレス６１）のレジスタへ、データバッファ２２のデータを書き込む。
（１２５）ＰＥ［２４８］（アドレス２４８）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２６）ＰＥ［６２］（アドレス６２）のレジスタへ、データバッファ２２のデータを書き込む。
（１２７）ＰＥ［２５２］（アドレス２５２）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２８）ＰＥ［６３］（アドレス６３）のレジスタへ、データバッファ２２のデータを書き込む。
【００３８】
なお、最初｛上記（１）及び（２）｝のＰＥ［０］からの読み出し・書き込み動作は、データに変換が無いため省略可能である。
【００３９】
上記の間引きの動作により、４個のＰＥおきのレジスタのデータが、ＰＥ［０］方向に詰めて配置されることになり、結果としてＰＥ［０］〜ＰＥ［６３］のレジスタに１／４に間引きされたデータが得られることになる。また、この処理は１２８ステップが必要であるが、間引き器１０はＧＰ４より開始の指示後は自動的に動作するため、プロセッサ２の命令によるＳＩＭＤ演算処理は並列に実行され得る。
【００４０】
≪第２の実施の形態≫
図３は、本発明の第２の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２のブロック図である。
【００４１】
図３において、拡大器１２は、プロセッサ２のレジスタファイル６の外部インタフェース１４に接続される。拡大器１２は、アクセス対象のＰＥ３のアドレスを生成するアドレス生成回路２０、読み出されたデータを書き込みまでの期間保持するデータバッファ２２、それらを制御するシーケンサ２４により構成され、プロセッサ２のグローバルプロセッサ（ＧＰ）４より動作の設定が行なわれる。
【００４２】
上記の構成において、２５６個のプロセッサエレメント３を有するＳＩＭＤ型マイクロプロセッサ２での、４倍の拡大動作は以下のようになる。
【００４３】
（１）ＰＥ［６３］（アドレス６３）のレジスタのデータを読み出し、データバッファ２２に格納する。
（２）ＰＥ［２５２］（アドレス２５２）のレジスタへ、データバッファ２２のデータを書き込む。
（３）ＰＥ［６２］（アドレス６２）のレジスタのデータを読み出し、データバッファ２２に格納する。
（４）ＰＥ［２４８］（アドレス２４８）のレジスタへ、データバッファ２２のデータを書き込む。
（５）ＰＥ［６１］（アドレス６１）のレジスタのデータを読み出し、データバッファ２２に格納する。
（６）ＰＥ［２４４］（アドレス２４４）のレジスタへ、データバッファ２２のデータを書き込む。
（７）ＰＥ［６０］（アドレス６０）のレジスタのデータを読み出し、データバッファ２２に格納する。
（８）ＰＥ［２４０］（アドレス２４０）のレジスタへ、データバッファ２２のデータを書き込む。
・・・・（同様の動作が繰り返されるため、途中略する。）
（１２３）ＰＥ［２］（アドレス２）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２４）ＰＥ［８］（アドレス８）のレジスタへ、データバッファ２２のデータを書き込む。
（１２５）ＰＥ［１］（アドレス１）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２６）ＰＥ［４］（アドレス４）のレジスタへ、データバッファ２２のデータを書き込む。
（１２７）ＰＥ［０］（アドレス０）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２８）ＰＥ［０］（アドレス０）のレジスタへ、データバッファ２２のデータを書き込む。
【００４４】
なお、最後｛上記（１２７）及び（１２８）｝のＰＥ［０］からの読み出し・書き込み動作は、データに変換が無いため省略可能である。
【００４５】
上記の拡大の動作により、ＰＥ［０］〜ＰＥ［６３］のレジスタのデータが、４個のＰＥおきにＰＥ［０］から間隔を取ってレジスタに配置されることになり、結果として４倍に拡大されたデータが得られることになる。また、この処理は１２８ステップが必要であるが、拡大器１２はＧＰ４より開始の指示後は自動的に動作するため、プロセッサ２の命令によるＳＩＭＤ演算処理は並列に実行され得る。
【００４６】
≪第３の実施の形態≫
従来技術で説明した領域判定処理のブロック化処理において、ブロック単位に保持するデータは、各ブロックに対する所定の判定の結果を示すデータであることが多い。しかも通常は、所定の判定に関して「一致／不一致」のみを示すデータである。このようなデータは１ビットのデータとして表現でき、よってブロック単位に保持するデータは、１ビットだけであることが多い。
【００４７】
図４（及び図２１）のブロック図にて示すように、本発明に係るＳＩＭＤ型マイクロプロセッサ２では、あるＰＥ３からみると、自らが属するＰＥ３に含まれるＲ０〜Ｒ３１レジスタ３４のデータと、左右夫々の方向の３つのＰＥ３に含まれるＲ０〜Ｒ３１レジスタ３４のデータとを、演算対象として選択し得るように設定されている。そのため、本発明に係るＳＩＭＤ型マイクロプロセッサ２では、ある画素を中心にして（それ自身を含めて）左右７画素のデータを１回のアクセスにより処理することが可能であるが、その範囲を越えてデータアクセスしようとすると、ライン（主走査）方向でのシフト操作等が必要となってしまう。
【００４８】
ところで、上記の領域判定処理では、広い範囲のデータが必要とされることがあり、例えば、ブロックデータで３２ブロック程度のデータが必要とされる場合がある。仮に３２ブロックまでを１回の操作でアクセス可能となる設定を構成しようとすると、３２の対象から１つを選択させるようなマルチプレクサが必要となる。そうするとハードウエアに係るコストが上昇する。一方で、ライン方向にシフト操作を繰り返す構成を形成する場合、相当に多くのステップ数が必要になり性能を低下させてしまうことになり得る。そこで、上記ブロックデータのように１ビットのみで構成されているデータの場合は、１つのＰＥ３のレジスタに、複数のＰＥ３分のブロックデータを格納することにより、上記問題点を解決することが想定され得る。
【００４９】
本発明に係るＳＩＭＤ型マイクロプロセッサ２では、１つのレジスタ（例えば、Ｒ０）には８ビットのデータを格納可能である。よって、図５のようにデータを格納すれば、左右３５ブロックまで１回のアクセスで扱えるようになる。なお、図５では、ｂｉｔ５、ｂｉｔ６及びｂｉｔ７は、無効データとして扱う（５×１ビット・パック処理）。
【００５０】
仮に、ブロックデータが２ビット必要な場合には、１つのレジスタを２ビットで分割して、複数のＰＥ分のデータを格納することも可能である。本発明に係るＳＩＭＤ型マイクロプロセッサ２では、レジスタは８ビットであるため、１つのＰＥ３（のレジスタ）に２ビットのブロック後データが４個まで格納可能である。その場合は、左右２８ブロックまで１回のアクセスで扱えるようになる（４×２ビット・パック処理）。
【００５１】
図６は、本発明の第３の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２のブロック図である。
【００５２】
図２に示される第１の実施の形態のＳＩＭＤ型マイクロプロセッサ２に対して、データバッファ２２への書き込みにバレルシフタ２６が追加されており、レジスタ３４の任意のビット位置のデータを、データバッファ２２の任意のビットに書き込むことが可能な構成となっている。
【００５３】
上記の構成において、２５６個のプロセッサエレメント３を有するＳＩＭＤ型マイクロプロセッサ２での、「１／４の間引き」且つ「５×１ビット・パック」の動作は以下のようになる。ここで、ＰＥの「レジスタのｂｉｔ０」とは、所定のレジスタの最下位ビットを示す。同様に、「ｂｉｔ１」、「ｂｉｔ２」、「ｂｉｔ３」、「ｂｉｔ４」は、ｂｉｔ０からみて順に１つずつ上位のビットを示す。
【００５４】
（１）ＰＥ［０］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ０に格納する。
（２）ＰＥ［４］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ１に格納する。
（３）ＰＥ［８］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ２に格納する。
（４）ＰＥ［１２］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ３に格納する。
（５）ＰＥ［１６］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ４に格納する。
（６）ＰＥ［０］のレジスタへ、データバッファ２２のデータを書き込む。
（７）ＰＥ［２０］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ０に格納する。
（８）ＰＥ［２４］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ１に格納する。
（９）ＰＥ［２８］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ２に格納する。
（１０）ＰＥ［３２］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ３に格納する。
（１１）ＰＥ［３６］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ４に格納する。
（１２）ＰＥ［１］のレジスタへ、データバッファ２２のデータを書き込む。
（１３）ＰＥ［４０］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ０に格納する。
（１４）ＰＥ［４４］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ１に格納する。
（１５）ＰＥ［４８］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ２に格納する。
（１６）ＰＥ［５２］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ３に格納する。
（１７）ＰＥ［５６］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ４に格納する。
（１８）ＰＥ［２］のレジスタへ、データバッファ２２のデータを書き込む。
・・・・（同様の動作が繰り返されるため、途中略する。）
（７３）ＰＥ［２４０］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ０に格納する。
（７４）ＰＥ［２４４］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ１に格納する。
（７５）ＰＥ［２４８］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ２に格納する。
（７６）ＰＥ［２５２］のレジスタのｂｉｔ０を読み出し、データバッファ２２のｂｉｔ３に格納する。
（７７）ＰＥ［１２］のレジスタへ、データバッファ２２のデータを書き込む。
【００５５】
なお、最後のデータ｛ＰＥ［１２］のレジスタ｝へ格納すべきデータは、４ビットデータとなる。また、上記の動作では、処理後のレジスタ内のデータの並びは図５とは異なる。例えば、ＰＥ［０］のレジスタのｂｉｔ０のデータは、結局“ｂｉｔ０”に格納される（図５では、ｂｉｔ２に格納されている）。
【００５６】
≪第４の実施の形態≫
画像処理の対象画像の（１ラインの、即ち主走査方向の）画素数が、ＳＩＭＤ型マイクロプロセッサ２のＰＥ個数よりも多い場合は、ＰＥ数で分割して処理を行なうことになる。分割される画素データは、例えば、図７のように複数のレジスタ（行）を利用して格納される（ここでは、対象画素が１０２４個、ＰＥ数が２５６個である）。
【００５７】
前に説明したように、そもそも本発明に係るＳＩＭＤ型マイクロプロセッサ２では、１つのＰＥ３の演算部において、該ＰＥ３の左隣３個までのＰＥ３に備わるレジスタ、及び該ＰＥ３の右隣３個までのＰＥ３に備わるレジスタと、アクセスできるように設定されている。例えば、画像データに関する所定のフィルタ処理を行なう場合、ある画素からみて（主走査方向上の）左右の幾つかの画素に係るデータが必要になることがある。上記設定はそのような処理に備えるためのものである。
【００５８】
ところが、画素データを分割する場合、上記のように左右のいくつかの画素データとアクセスすることが必須となる処理を行なおうとしても、分割の境界及びその近傍においてアクセスすべき画素データが得られない、という問題が発生することがある。具体的に説明する。
【００５９】
右３画素、左３画素を利用して、フィルタ処理を行なう場合を想定する。このとき、画素データは図７に示すようにレジスタ３４に格納されているとする。「画素２５５」をＲ０レジスタに格納するＰＥ［２５５］からは、右側３画素である「画素２５４」、「画素２５３」及び「画素２５２」に対して問題なくアクセスできる。しかし、左側３画素である「画素２５６」、「画素２５７」及び「画素２５８」は、ＰＥ［０］、ＰＥ［１］及びＰＥ［２］のＲ１レジスタに格納されているため、ＰＥ［２５５］からは即座にはアクセスできない。
【００６０】
上記の問題の対策として、分割の境界であるところの端部近傍のＰＥ、例えばＰＥ［０］の近傍のＰＥやＰＥ［２５５］の近傍のＰＥにおいては、データの重なりを持たせて画素データを格納させる、というものがある。
【００６１】
図８では、両端部近傍にて夫々４画素（データ）分の重なりを持たせて、画素データを格納している。つまり、ＰＥ［２５２］〜ＰＥ［２５５］のＲ０レジスタに格納されている画素２４８〜２５１は、画素２４７（ＰＥ［２４７］）までのＰＥからの「参照画素」として使用され、ＰＥ［０］〜ＰＥ［３］のＲ１レジスタに格納されている画素２４４〜２４７は、画素２４８（ＰＥ［２４８］）以降のＰＥからの「参照画素」として使用される。これらの「重なりの部分」は、画像処理終了後に最終的には不要データとして削除されて出力される。また、図面上の「ダミー」とは、１０２４画素分の画像データの両端に置かれている不要なデータであり、最終的には削除して出力される。このような「重なりの部分」の大きさは、画像処理の全体において左右方向の参照画素数がどの程度まで必要であるか、で決定される。通常は、１６画素程度である。
【００６２】
但し、上記のように、「重なりの部分」の画素データを格納するため、実質上有効なＰＥ数が減少する。「重なりの部分」の画素データは参照のためにのみ供されるからである。よって、分割数が増加し格納されるレジスタ数が増加する。図８においては、Ｒ４レジスタもデータ格納のために利用されている。
【００６３】
図８のように画像データを分割した場合、間引き処理後は図１０のようになる。図１０は、主走査方向の４画素を１ブロックとして１／４に間引いた処理を行なった結果である。図では、画素０〜３をブロック０、画素４〜７をブロック４、画素８〜１１をブロック８、．．．というように、ブロックの名称を付けている。
【００６４】
先ず図８の状況において、ブロックに含まれる画素データ、例えば画素０〜３の画素値をあるパターン値と比較して一致していれば“１”、一致しなければ“０”に、ブロックデータを設定して、ブロックの先頭の画素に係るＰＥ（例ではＰＥ［０］、ＰＥ［４］、ＰＥ［８］、...）のレジスタに格納する。このことにより、ブロックデータは図９のように４の倍数のＰＥに配置される。続いて、実施の形態１と同様にして１／４に間引くと、図１０のようにＰＥ［０］の方向に詰めて配置される。
【００６５】
ところで、図１０のような間引き処理後のブロックデータを使用して領域判定に係る処理を行なう場合、やはり左右方向のＰＥ３の備えるレジスタ（に格納されるデータ）の参照が必要となることがある。例えば、図１０において、参照ブロックデータが左右３ブロック（３個のＰＥ）まで必要な処理を想定する。図１０のＰＥ［６２］のレジスタＲ０に格納されているブロック２４４のデータに注目すると、右方向の参照ブロックデータは存在する。しかし、左方向の参照ブロックデータに関してはＰＥ［６３］のブロック２４８しか存在しない。同様に、ＰＥ［１］のレジスタＲ１に格納されているブロック２４８のデータは、左方向の参照ブロックデータは存在するが、右方向の参照ブロックデータはＰＥ［０］のブロック２４４しか存在しない。従って、このままでは処理を継続できないことになってしまう。
【００６６】
上記の問題点の解決策の一つとして、図８における間引き前の画素データの重なりの部分を相当に大きく取る、というものが考えられ得る。しかし、こうすると「重なりの部分」が図８のものの４倍も必要となってしまい、実質上有効なＰＥ個数はますます減少してしまう。
【００６７】
本発明の第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２は、それらの問題点をも解決するものである。
【００６８】
図１２は、本発明の第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２のブロック図である。図２に示される本発明の第１の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２と、略、同様の構成であるが、この第４の実施の形態では、データバッファ２２内のバッファが５つ（ｂｕｆ０、ｂｕｆ１、ｂｕｆ２、ｂｕｆ３、ｂｕｆ４）となっている。
【００６９】
上記の構成において、２５６個のプロセッサエレメント３を有するＳＩＭＤ型マイクロプロセッサ２での、１／４の間引き動作は以下のようになる。
【００７０】
先ず、間引き処理の前に、ブロックデータを図１１のように予め配置する。必要なブロックデータは、図９と同様に、ＰＥ番号が「４の倍数」となるＰＥ（のレジスタ）に格納されている。更に、ＰＥ番号が「４の倍数−１」のＰＥ（のレジスタ）には、画像データを分割して形成された分割画像データの夫々の、１つ前の分割で取り扱われるブロックデータがＰＥ一つ分右方向にずらされて、格納されている。同様に、ＰＥ番号が「４の倍数＋１」のＰＥ（のレジスタ）には、画像データを分割して形成された分割画像データの夫々の、１つ後の分割で取り扱われるブロックデータがＰＥ一つ分左方向にずらされて、格納されている。
【００７１】
図１１において、例えば、Ｒ１レジスタに着目する。ＰＥ番号が４の倍数であるＰＥには、Ｒ１レジスタ（分割１）に格納されていたデータを処理したブロックデータが格納されている。つまり、前に説明したのと同様に、主走査方向の４画素を１ブロックとして、各ブロックに含まれる画素データに所定の演算を施しその結果値をブロックの先頭の画素に係るＰＥのＲ１レジスタに格納している。
【００７２】
同様に、ＰＥ番号が「４の倍数−１」であるＰＥには、Ｒ０レジスタ（分割０）に格納されていたデータを処理したブロックデータが格納されている。つまり、「分割０」において主走査方向の４画素を１ブロックとして、各ブロックに含まれる画素データに所定の演算を施し、その結果値を「４の倍数−１」のＰＥのＲ１レジスタに（ＰＥ一つ分右方向にずらして）格納している。さらに、ＰＥ番号が「４の倍数＋１」であるＰＥには、Ｒ２レジスタ（分割２）に格納されていたデータを処理したブロックデータが格納されている。つまり、「分割２」において主走査方向の４画素を１ブロックとして、各ブロックに含まれる画素データに所定の演算を施し、その結果値を「４の倍数＋１」のＰＥのＲ１レジスタに（ＰＥ一つ分左方向にずらして）格納している。
【００７３】
図１１のように配置されたブロックデータに対して、間引き器１０は、以下の動作を行なう。
【００７４】
（１）ＰＥ［２３５］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（２）ＰＥ［０］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（３）ＰＥ［２３９］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（４）ＰＥ［１］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（５）ＰＥ［２４３］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（６）ＰＥ［２］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（７）ＰＥ［２４７］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（８）ＰＥ［３］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（９）ＰＥ［５］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ１に格納する。
（１０）ＰＥ［９］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ２に格納する。
（１１）ＰＥ［１３］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ３に格納する。
（１２）ＰＥ［１７］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ４に格納する。
（１３）ＰＥ［４］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（１４）ＰＥ［４］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（１５）ＰＥ［８］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（１６）ＰＥ［５］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（１７）ＰＥ［１２］のレジスタのデータを読み出しデータバッファ２２のｂｕｆ０に格納する。
（１８）ＰＥ［６］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（１９）ＰＥ［１６］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（２０）ＰＥ［７］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
・・・・（同様の動作が繰り返されるため、途中略する。）
（１３１）ＰＥ［２４０］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（１３２）ＰＥ［６３］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（１３３）ＰＥ［２４４］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（１３４）ＰＥ［６４］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（１３５）ＰＥ［２４８］のレジスタのデータを読み出し、データバッファ２２のｂｕｆ０に格納する。
（１３６）ＰＥ［６５］のレジスタへデータバッファ２２のｂｕｆ０のデータを書き込む。
（１３７）ＰＥ［６６］のレジスタへデータバッファ２２のｂｕｆ１のデータを書き込む。
（１３８）ＰＥ［６７］のレジスタへデータバッファ２２のｂｕｆ２のデータを書き込む。
（１３９）ＰＥ［６８］のレジスタへデータバッファ２２のｂｕｆ３のデータを書き込む。
（１４０）ＰＥ［６９］のレジスタへデータバッファ２２のｂｕｆ４のデータを書き込む。
【００７５】
上記の動作により、間引き後のブロックデータは、図１３のように配置される。図１０とは異なり、ＰＥ［６５］のＲ０レジスタに格納されているブロック２４４のデータには、左右いずれの方向も参照ブロックデータが存在しており、同様にＰＥ［４］のＲ１レジスタに格納されているブロック２４８のデータにも、左右いずれの方向も参照ブロックデータが存在している。
【００７６】
上記の第３の実施の形態では、データバッファ２２のバッファ数を５つとして構成している。これは、ＰＥ［５］、ＰＥ［９］、ＰＥ［１３］、ＰＥ［１７］のレジスタのデータが、間引き後に１／４に詰められたデータにより上書きされて消失するのを防ぐためである。このこと（即ち、消失を防ぐこと）は図２に示される第１の実施の形態の構成においても実現可能である。つまり、ＰＥ［５］、ＰＥ［９］、ＰＥ［１３］、ＰＥ［１７］のレジスタのデータを、一旦上記処理で他の用途に利用されないＰＥのレジスタ、例えばＰＥ［２３５］、ＰＥ［２３９］、ＰＥ［２４３］、ＰＥ［２４７］のレジスタに待避し、ＰＥ［６６］〜ＰＥ［６９］に書き込む際にそれらから読み出す、というようなシーケンスの変更だけでも実現可能である。
【００７７】
第３の実施の形態で示した、ビットパックによる間引き処理においても、画像データがＰＥ数により分割される場合は、分割境界における参照データが欠缺しているという問題を回避するために、この第４の実施の形態と同様な処理が必要となることがある。
【００７８】
≪第５の実施の形態≫
ブロックデータを使用した領域判定処理の実施後は、その判定結果により各画素毎に画像処理内容が決定される。したがって、１／４（等）に間引きしたデータより得られた領域判定結果は、４倍（等）に拡大して元の画素の位置にまで戻さなければならない。第１の実施の形態のＳＩＭＤ型マイクロプロセッサ２により１／４に間引いた場合は、第２の実施の形態のＳＩＭＤ型マイクロプロセッサ２により拡大すれば判定結果データを対応する元の画素の位置にまで戻すことができる。しかし、第３の実施の形態では、ビットパックが形成されているため、元に戻す（即ち、ビットパックを戻す、ビットアンパックする）には別の実施の形態を利用する必要がある。
【００７９】
図１４は、本発明の第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２のブロック図である。
【００８０】
図３に示される第２の実施の形態のＳＩＭＤ型マイクロプロセッサ２に対して、データバッファ２２からの読み出しにバレルシフタ２６’が追加されており、データバッファ２２の任意のビット位置のデータを、レジスタ３４の任意のビットに書き込みむことが可能な構成となっている。
【００８１】
上記の構成において、２５６個のプロセッサエレメント３を有するＳＩＭＤ型マイクロプロセッサ２での、「４倍の拡大」且つ「５×１ビットアンパック」の動作は以下のようになる。ここで、「データバッファのｂｉｔ０」とは、所定のデータバッファ２２の最下位ビットを示す。同様に、「ｂｉｔ１」、「ｂｉｔ２」、「ｂｉｔ３」、「ｂｉｔ４」は、ｂｉｔ０からみて順に１つずつ上位のビットを示す。
【００８２】
（１）ＰＥ［１２］のレジスタのデータを読み出し、データバッファ２２に格納する。
（２）ＰＥ［２５２］のレジスタへ、データバッファ２２のｂｉｔ３のデータを書き込む。
（３）ＰＥ［２４８］のレジスタへ、データバッファ２２のｂｉｔ２のデータを書き込む。
（４）ＰＥ［２４４］のレジスタへ、データバッファ２２のｂｉｔ１のデータを書き込む。
（５）ＰＥ［２４０］のレジスタへ、データバッファ２２のｂｉｔ０のデータを書き込む。
（６）ＰＥ［１１］のレジスタのデータを読み出し、データバッファ２２に格納する。
（７）ＰＥ［２３６］のレジスタへ、データバッファ２２のｂｉｔ４のデータを書き込む。
（８）ＰＥ［２３２］のレジスタへ、データバッファ２２のｂｉｔ３のデータを書き込む。
（９）ＰＥ［２２８］のレジスタへ、データバッファ２２のｂｉｔ２のデータを書き込む。
（１０）ＰＥ［２２４］のレジスタへ、データバッファ２２のｂｉｔ１のデータを書き込む。
（１１）ＰＥ［２２０］のレジスタへ、データバッファ２２のｂｉｔ０のデータを書き込む。
・・・・（同様の動作が繰り返されるため、途中略する。）
（７２）ＰＥ［０］のレジスタのデータを読み出し、データバッファ２２に格納する。
（７３）ＰＥ［１６］のレジスタへ、データバッファ２２のｂｉｔ４のデータを書き込む。
（７４）ＰＥ［１２］のレジスタへ、データバッファ２２のｂｉｔ３のデータを書き込む。
（７５）ＰＥ［８］のレジスタへ、データバッファ２２のｂｉｔ２のデータを書き込む。
（７６）ＰＥ［４］のレジスタへ、データバッファ２２のｂｉｔ１のデータを書き込む。
（７７）ＰＥ［０］のレジスタへ、データバッファ２２のｂｉｔ０のデータを書き込む。
【００８３】
なお、最後｛上記（７７）｝のＰＥ［０］への書き込み動作は、データに変換が無いため省略可能である。
【００８４】
≪第６の実施の形態≫
ブロックデータを使用した領域判定処理の実施後は、その判定結果により各画素毎に画像処理内容が決定される。したがって、１／４（等）に間引きしたデータより得られた領域判定結果は、４倍（等）に拡大して元の画素の位置にまで戻さなければならない。第１の実施の形態のＳＩＭＤ型マイクロプロセッサ２により１／４に間引いた場合は、第２の実施の形態のＳＩＭＤ型マイクロプロセッサ２により拡大すれば判定結果データを対応する元の画素の位置にまで戻すことができる。
【００８５】
第５の実施の形態の説明で記したが、第２の実施の形態のＳＩＭＤ型マイクロプロセッサ２を利用して、領域判定結果データが４倍（等）に拡大されると、ＰＥ番号が４の倍数であるＰＥにて、有効な領域判定結果データが配置される。図１５はその様子を示す。
【００８６】
拡大後の処理で、４の倍数以外のＰＥ番号を備えるＰＥにおいては、そのＰＥ番号以下で且つ直近の４の倍数のＰＥ番号を備えるＰＥの、レジスタに格納される領域判定結果データを使用することになる。即ち、ＰＥ［０］〜ＰＥ［３］はＰＥ［０］のレジスタ上の領域判定結果データ、ＰＥ［４］〜ＰＥ［７］はＰＥ［４］のレジスタ上の領域判定結果データ、ＰＥ［８］〜ＰＥ［１１］はＰＥ［８］のレジスタ上の領域判定結果データ、・・・を使用することとなる。換言すると、ＰＥ番号が「４の倍数」のＰＥは自身のレジスタ上のデータ、ＰＥ番号が「４の倍数＋１」のＰＥは１つ右のＰＥのレジスタ上のデータ、ＰＥ番号が「４の倍数＋２」のＰＥは２つ右のＰＥのレジスタ上のデータ、ＰＥ番号が「４の倍数＋３」のＰＥは３つ右のＰＥのレジスタ上のデータを参照することになる。
【００８７】
上記のように、ＰＥ毎に処理が異なるものになる。そのため、全てのＰＥで同じ処理を実行（並列処理）するという、ＳＩＭＤ型マイクロプロセッサの特色をうまく利用することができない。
【００８８】
図３のブロック図に示されるＳＩＭＤ型マイクロプロセッサ２は、本発明の第２の実施の形態に係るものであると同時に、本発明の第６の実施の形態に係るものである。但し、第６の実施の形態においては、第２の実施の形態と処理のシーケンスが異なる。
【００８９】
上記の構成において、２５６個のプロセッサエレメント３を有するＳＩＭＤ型マイクロプロセッサ２での、４倍の拡大動作は以下のようになる。
【００９０】
（１）ＰＥ［６３］（アドレス６３）のレジスタのデータを読み出し、データバッファ２２に格納する。
（２）ＰＥ［２５５］（アドレス２５５）のレジスタへ、データバッファ２２のデータを書き込む。
（３）ＰＥ［２５４］（アドレス２５４）のレジスタへ、データバッファ２２のデータを書き込む。
（４）ＰＥ［２５３］（アドレス２５３）のレジスタへ、データバッファ２２のデータを書き込む。
（５）ＰＥ［２５２］（アドレス２５２）のレジスタへ、データバッファ２２のデータを書き込む。
（６）ＰＥ［６２］（アドレス６２）のレジスタのデータを読み出し、データバッファ２２に格納する。
（７）ＰＥ［２５１］（アドレス２５１）のレジスタへ、データバッファ２２のデータを書き込む。
（８）ＰＥ［２５０］（アドレス２５０）のレジスタへ、データバッファ２２のデータを書き込む。
（９）ＰＥ［２４９］（アドレス２４９）のレジスタへ、データバッファ２２のデータを書き込む。
（１０）ＰＥ［２４８］（アドレス２４８）のレジスタへ、データバッファ２２のデータを書き込む。
（１１）ＰＥ［６１］（アドレス６１）のレジスタのデータを読み出し、データバッファ２２に格納する。
（１２）ＰＥ［２４７］（アドレス２４７）のレジスタへ、データバッファ２２のデータを書き込む。
（１３）ＰＥ［２４６］（アドレス２４６）のレジスタへ、データバッファ２２のデータを書き込む。
（１４）ＰＥ［２４５］（アドレス２４５）のレジスタへ、データバッファ２２のデータを書き込む。
（１５）ＰＥ［２４４］（アドレス２４４）のレジスタへ、データバッファ２２のデータを書き込む。
・・・・（同様の動作が繰り返されるため、途中略する。）
（３１１）ＰＥ［１］（アドレス１）のレジスタのデータを読み出し、データバッファ２２に格納する。
（３１２）ＰＥ［７］（アドレス７）のレジスタへ、データバッファ２２のデータを書き込む。
（３１３）ＰＥ［６］（アドレス６）のレジスタへ、データバッファ２２のデータを書き込む。
（３１４）ＰＥ［５］（アドレス５）のレジスタへ、データバッファ２２のデータを書き込む。
（３１５）ＰＥ［４］（アドレス４）のレジスタへ、データバッファ２２のデータを書き込む。
（３１６）ＰＥ［０］（アドレス０）のレジスタのデータを読み出し、データバッファ２２に格納する。
（３１７）ＰＥ［３］（アドレス３）のレジスタへ、データバッファ２２のデータを書き込む。
（３１８）ＰＥ［２］（アドレス２）のレジスタへ、データバッファ２２のデータを書き込む。
（３１９）ＰＥ［１］（アドレス１）のレジスタへ、データバッファ２２のデータを書き込む。
（３２０）ＰＥ［０］（アドレス０）のレジスタへ、データバッファ２２のデータを書き込む。
【００９１】
なお、最後｛上記（３２０）｝のＰＥ［０］への書き込み動作は、データに変換が無いため省略可能である。
【００９２】
上記の動作によって拡大後のデータは図１６のようになる。よって、全てのＰＥにおいて、そのＰＥ自身に備わるレジスタに格納されるデータを、領域判定結果として参照すればよいことになる。
【００９３】
≪第７の実施の形態≫
前述の第１の実施の形態の構成では、「間引き」の単位は、２のベキ乗（２、４、８、１６、．．．）程度である。よって、ブロック化等の処理においては有効な構成である。ただし、画像データの縮小等の処理では、「間引き」の間隔が一定ではない。即ち、「間引き」の間隔が任意に指定され得る手段が備わることが望ましい。
【００９４】
図１７は、本発明に係る第７の実施の形態のＳＩＭＤ型マイクロプロセッサ２のブロック図である。図２の第１の実施の形態と基本的には同様の構成であるが、シーケンサ２４において、間引き間隔の指定のためのＲＡＭ３０が設置されている。ＲＡＭ３０の容量は、単位あたり１ビットのデータを、縮小対象の画像データの主走査画素数分まで、少なくとも備え得る程度のものである。
【００９５】
上記の構成において、縮小動作は以下のように行なう。
【００９６】
（１）アドレス生成回路２０のリードポインタ、ライトポインタを“０”に設定する。同時に、ＲＡＭ３０のアドレスポインタを“０”に設定する。
（２）リードポインタの値をアドレスとするＰＥのレジスタデータを読み出し、データバッファ２２に格納する。同時に、ＲＡＭ３０のアドレスポインタで指定されたアドレスよりＲＡＭ３０のデータを読み出し、リードポインタ、ＲＡＭアドレスポインタを“１”だけインクリメントする。
（３）読み出されたＲＡＭ３０のデータが“１”であった場合は、ライトポインタをアドレスとするＰＥのレジスタに、データバッファ２２のデータを書き込み、ライトポインタを“１”だけインクリメントする。
（４）上記の（２）（３）の動作を、リードポインタがＰＥの最大数を越えるまで繰り返す。
【００９７】
上記の動作により、ＲＡＭ３０において“１”が格納されていない箇所（アドレス）（即ち、“０”が格納されている箇所（アドレス））に対応する画素は間引かれ、ＲＡＭ３０において“１”が格納されている箇所（アドレス）に対応する画素はＰＥ［０］方向に詰められて、レジスタにデータが配置されることになる。
【００９８】
縮小対象の画像データの画素数がＰＥ数よりも大きく、画像データを分割して処理すべき場合においても、２つ目以降の分割画像データの縮小処理で上記（１）のＲＡＭアドレスポインタのクリアを行なわずに処理を継続することができる。２つ目以降の分割画像データの縮小処理では、以下のような処理となる。
【００９９】
（１）アドレス生成回路２０のリードポインタ、ライトポインタを“０”に設定する。
（２）リードポインタの値をアドレスとするＰＥのレジスタデータを読み出し、データバッファ２２に格納する。同時に、ＲＡＭ３０のアドレスポインタで指定されたアドレスよりＲＡＭ３０のデータを読み出し、リードポインタ、ＲＡＭアドレスポインタを“１”だけインクリメントする。
（３）読み出されたＲＡＭ３０のデータが“１”であった場合は、ライトポインタをアドレスとするＰＥのレジスタに、データバッファ２２のデータを書き込み、ライトポインタを“１”だけインクリメントする。
（４）上記の（２）（３）の動作を、リードポインタがＰＥの最大数を越えるまで繰り返す。
【０１００】
なお、上記の動作では、画像データを分割した場合に、縮小画像データが分割された単位で下位のＰＥ側に間引かれて（詰められて）配置される、という問題点がある（上位のＰＥには“ダミー”データが並ぶことになる）。更に、縮小画像データにおいては、特に上記の「第４の実施の形態」で説明したような参照画素データが配置されていない、という問題点も発生する。そこで、図１８のようにリード側のレジスタ６’（縮小前画像）とライト側のレジスタ（縮小後画像）６”を分けて接続する。図１８の形態において、リード側のレジスタにおいてのみ画像データを分割して格納し、さらに、リードポインタのみを２つ目以降の画像データに進めることにより、縮小動作を継続する。こうすることで、分割画像の縮小後画像データの最終端のＰＥの次のＰＥから、次の分割画像の縮小画像データを配置することが可能となる。この場合には、リードポインタを２つ目以降の画像データに進めても、ライトポインタのクリアは行なわない。
【０１０１】
≪第８の実施の形態≫
前述の第２の実施の形態の構成では、「拡大」の単位は、２のベキ乗（２、４、８、１６、．．．）程度である。よって、ブロック化された領域判定データの現画像データへの反映のための拡大処理等においては、有効な構成である。ただし、画像データの拡大等の処理では、「拡大」の間隔が一定ではない。即ち、「拡大」の間隔が任意に指定され得る手段が備わることが望ましい。
【０１０２】
図１９は、本発明に係る第８の実施の形態のＳＩＭＤ型マイクロプロセッサ２のブロック図である。図３の第２の実施の形態と基本的には同様の構成であるが、シーケンサ２４において、拡大間隔の指定のためのＲＡＭ３０が設置されている。ＲＡＭ３０の容量は、単位あたり１ビットのデータを、拡大後の画像データの主走査画素数分まで、少なくとも備え得る程度のものである。
【０１０３】
上記の構成において、拡大動作は以下のように行なう。
【０１０４】
（１）アドレス生成回路２０のリードポインタを拡大前の画像データの主走査画素値、ライトポインタを拡大後の画像データの主走査画素値に設定する。同時に、ＲＡＭ３０のアドレスポインタを拡大後の画像データの主走査画素値に設定する。
（２）リードポインタの値をアドレスとするＰＥのレジスタデータを読み出し、データバッファ２２に格納する。同時に、ＲＡＭ３０のアドレスポインタで指定されたアドレスよりＲＡＭ３０のデータを読み出し、ＲＡＭ３０のアドレスポインタを“１”だけデクリメントする。読み出されたＲＡＭ３０のデータが“１”であった場合は、リードポインタを“１”だけデクリメントする。
（３）ライトポインタをアドレスとするＰＥのレジスタに、データバッファ２２のデータを書き込む。ライトポインタを“１”だけデクリメントする。
（４）上記の（２）（３）の動作を、ライトポインタが０となるまで繰り返す。
【０１０５】
上記の動作により、ＲＡＭ３０において“１”が格納されていない箇所（アドレス）（即ち、“０”が格納されている箇所（アドレス））に対応する画素には、リードポインタの直前のアドレスの（拡大前）画素データが書き込まれ、ＲＡＭ３０において“０”が格納されている箇所（アドレス）に対応する画素には、リードポインタの指し示すアドレスの（拡大前）画素データが書き込まれる。このようにして、拡大動作が実現され得る。
【０１０６】
なお、上記の動作では画像データを分割した場合に、拡大画像データが拡大前の画像データを、それらが読み出される前に、上書きしてしまうという問題点が発生し得る。そこで、図２０のように、リード側のレジスタ６’（拡大前画像データ）とライト側のレジスタ６”（拡大後画像データ）を分けて接続する。動作終了後にライト側のレジスタの内容を、拡大後の分割画像データ格納レジスタに転送することにより、上記の上書きされるという問題点は回避され得る。このとき、以下のような動作となる。
【０１０７】
（１）アドレス生成回路２０のリードポインタを拡大前の画像データの主走査画素値に設定する。ライトポインタを、拡大後の画像データの主走査画素値をＰＥ個数で割った余りに、設定する。さらに、ＲＡＭ３０のアドレスポインタを拡大後の画像データの主走査画素値に設定する。
（２）リードポインタの値をアドレスとするＰＥのレジスタデータを読み出し、データバッファ２２に格納する。同時に、ＲＡＭ３０のアドレスポインタで指定されたアドレスよりＲＡＭ３０のデータを読み出し、ＲＡＭ３０のアドレスポインタを“１”だけデクリメントする。読み出されたＲＡＭ３０のデータが“１”であった場合は、リードポインタを“１”だけデクリメントする。
（３）ライトポインタをアドレスとするＰＥのレジスタに、データバッファ２２のデータを書き込む。ライトポインタを“１”だけデクリメントする。
（４）上記の（２）（３）の動作を、ライトポインタが０となるまで繰り返す。
（５）拡大後レジスタのデータを分割画像データ格納レジスタの「１分割目」に転送する。
（６）ライトポインタを、ＰＥ個数（の値）に設定する。
（７）リードポインタの値をアドレスとするＰＥのレジスタデータを読み出し、データバッファ２２に格納する。同時に、ＲＡＭ３０のアドレスポインタで指定されたアドレスよりＲＡＭ３０のデータを読み出し、ＲＡＭ３０のアドレスポインタを“１”だけデクリメントする。読み出されたＲＡＭ３０のデータが“１”であった場合は、リードポインタを“１”だけデクリメントする。
（８）ライトポインタをアドレスとするＰＥのレジスタに、データバッファ２２のデータを書き込む。ライトポインタを“１”だけデクリメントする。
（９）上記の（７）（８）の動作をライトポインタが０となるまで繰り返す。
（１０）拡大後レジスタのデータを分割画像データ格納レジスタの「次の分割」に転送する。
（１１）上記の（６）〜（１０）をリードポインタが０となるまで繰り返す。
【０１０８】
なお、拡大前画像データが分割されていた場合には、拡大前レジスタの内容を処理中に変更し、各ポインタの制御を適宜変更することにより、対応可能である。
【０１０９】
≪その他の実施の形態≫
以上の実施の形態における間引き器は、いずれもシーケンサ２４におけるシーケンスが異なることが主たる相違であるため、当然、幾つかの実施の形態を兼ね備えるような構成も簡単に実現できる。このことは以上の実施の形態における拡大器についても同様である。さらに、例えば、第１の実施の形態から第６の実施の形態の機能を、全部兼ね備える１つの装置を構成することも可能である。
【０１１０】
【発明の効果】
本発明を利用することにより、以下のような効果を得ることができる。
【０１１１】
第１の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、画素データにおいて主走査方向の間引き処理が、簡素な構成かつ低コストにより、実現することができる。
【０１１２】
第２の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、画素データにおいて主走査方向の拡大処理が、簡素な構成かつ低コストにより、実現することができる。
【０１１３】
第３の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、より広い範囲の参照画素データを一度の処理で扱えるようになる。
【０１１４】
第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、ＰＥ個数を越える画素数を持つ画像データに対し、フィルタ処理などで必要とされる（近傍の）参照データを考慮しつつ、間引き処理を行なうことが可能となる。
【０１１５】
第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、ビットパックされたデータを拡大処理することが可能となる。
【０１１６】
第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、拡大処理やブロック化処理の後、ＳＩＭＤ演算処理を少ないステップ行なうことができる。
【０１１７】
第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、画像縮小処理を実現することができる。
【０１１８】
第８の実施の形態に係るＳＩＭＤ型マイクロプロセッサを利用すると、画像拡大処理を実現することができる。
【図面の簡単な説明】
【図１】本発明に係るＳＩＭＤ型マイクロプロセッサの概略の構成を示すブロック図（１）である。
【図２】第１の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図である。
【図３】第２の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図である。
【図４】演算部とレジスタとの関係を示すブロック図である。
【図５】演算部と、ビットパックデータが格納されたレジスタとの関係を示すブロック図である。
【図６】第３の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図である。
【図７】画素数がＳＩＭＤ型マイクロプロセッサのＰＥ個数よりも多い場合の、画素データがレジスタに格納される状況を示す模式図である。
【図８】画素数がＳＩＭＤ型マイクロプロセッサのＰＥ個数よりも多い場合、重なり部分を形成しつつ画素データがレジスタに格納される状況を示す模式図である。
【図９】間引き前のブロックデータがレジスタに格納される状況を示す模式図である。
【図１０】間引き後のブロックデータがレジスタに格納される状況を示す模式図である。
【図１１】第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサにて利用されるブロックデータの、格納の状況を示す模式図である。
【図１２】第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図である。
【図１３】第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサによって形成される、間引き後のブロックデータのレジスタへの格納の好適な状況を示す模式図である。
【図１４】第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図である。
【図１５】第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサにて利用されるブロックデータの格納の状況を示す模式図である。
【図１６】第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサにて形成されるブロックデータの格納の状況を示す模式図である。
【図１７】第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図（１）である。
【図１８】第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図（２）である。
【図１９】第８の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図（１）である。
【図２０】第８の実施の形態に係るＳＩＭＤ型マイクロプロセッサのブロック図（２）である。
【図２１】本発明に係るＳＩＭＤ型マイクロプロセッサの概略の構成を示すブロック図（２）である。
【図２２】ブロック化処理の概念図である。
【符号の説明】
２・・・ＳＩＭＤ型マイクロプロセッサ、
３・・・プロセッサエレメント（ＰＥ）、
４・・・グローバルプロセッサ（ＧＰ）、
６、６’、６”・・・レジスタファイル、
８・・・演算アレイ、
１０・・・間引き器、
１２・・・拡大器、
１４、１４’、１４”・・・外部インタフェース、
２０、２０’、２０”・・・アドレス生成回路、
２２・・・データバッファ、
２４・・・シーケンサ、
２６、２６’・・・バレルシフタ、
３０・・・ＲＡＭ（ランダムアクセスメモリ）、
３４・・・レジスタ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a single instruction-stream data-stream (SIMD) type microprocessor.
[0002]
[Prior art]
In the SIMD type microprocessor, the same arithmetic processing can be executed simultaneously on a plurality of data with one instruction. This structure is frequently used in applications related to processing (for example, image processing) that has the same calculation but a very large amount of data.
[0003]
In normal arithmetic processing in a SIMD type microprocessor, a plurality of arithmetic units (Processor Elements [PE]; processor elements) are arranged and the same arithmetic is simultaneously performed on a plurality of data.
[0004]
The SIMD type microprocessor can sufficiently exhibit its performance in processing in which all PEs operate simultaneously.
[0005]
By the way, in the (arithmetic) processing of image data, it is detected whether the area of the image data to be processed is, for example, a “character area” or a “photo area”, and the detection result is used. Processing contents may be changed for each area. In the case of such detection, it is necessary to process pixel data in a considerably wide area range as determination target data. However, in the SIMD type microprocessor, when the pixel data is directly manipulated, only the data in the range of about “8 × 8” pixels can be processed at a time. Therefore, in order to process pixel data in a wider range, for example, “4 × 4” pixel data is converted into one data (block) (referred to as blocking), and region determination processing is performed on this block. Sometimes. When “4 × 4” pixels are taken as one block and “8 × 8” blocks are collected and processing is performed within the range, data in a range of “32 × 32” pixels is eventually processed. This is shown in FIG. Therefore, a considerably wide range of pixel data can be handled at a time.
[0006]
In the above block processing, reduction processing is essential.
[0007]
However, it is usually difficult for the SIMD type microprocessor to perform this reduction processing. In the conventional SIMD type microprocessor, various devices have been devised.
[0008]
For example, in the prior art, there is a technique in which reduction of data is realized by selecting data to be output to the outside by giving an instruction to output data to an output register built in the PE.
[0009]
In Japanese Patent Laid-Open No. 8-123683, the reduction process is realized by controlling the shift operation of the output shift register output from the processor to the outside and outputting the skipped pixel data.
[0010]
However, in the case of the above technique, it is limited only to reducing the output data, and data in the middle of the calculation cannot be reduced. However, it is necessary to add a circuit for controlling whether or not to output data in each PE (which causes a high cost problem). Furthermore, even if it tries to enlarge the reduced data, the above technique cannot cope with it at all.
[0011]
Further, for example, in another conventional technique, when outputting data of a register built in the PE to the outside and storing the data in an external FIFO (First In First Out) memory, arbitrary data is thinned out to the FIFO. The data is stored in the memory, and then the data is transferred from the FIFO memory to the register of each PE to realize reduction. Also in this case, a FIFO memory for one line is additionally required (there is a problem of high cost).
[0012]
A similar conventional technique is disclosed in Japanese Patent Laid-Open No. 9-212737. There, a processor element is provided with a parallel / serial converter and a serial / parallel converter, thereby realizing reduction / enlargement processing. In this prior art, since it is necessary to mount the hardware described above, there is a problem that unnecessary cost occurs when the reduction operation is unnecessary.
[0013]
[Problems to be solved by the invention]
An object of the present invention is to perform thinning processing, that is, reduction processing and enlargement processing with a simple configuration and low cost in a SIMD type microprocessor.
[0014]
[Means for Solving the Problems]
  The present invention has been made to achieve the above object. The data thinning-out device according to claim 1 according to the present invention includes:
  SIMD type microprocessor including a plurality of processor elementsHave,
  A data thinning device connected to a data transfer port for accessing a general-purpose register incorporated in a processor element from the outside of the microprocessor.
  Controlled by an internal sequencer,
  From the global processor included in the SIMD microprocessorPerforms processing in parallel with the SIMD arithmetic processing of the SIMD type microprocessor by automatically operating according to the operation start instruction setting to be performed,
  AboveSelect and read multiple arbitrary data from the data stored in the general-purpose register built in the processor element, and then write the multiple data back to the general-purpose register.
  The interval between the processor elements in which the plurality of data is stored by writing back is smaller than the interval between the processor elements in which the plurality of data is stored at the time of reading.
[0015]
  A data enlarging apparatus according to claim 2 according to the present invention includes:
  SIMD type microprocessor including a plurality of processor elementsHave,
  A data expansion device connected to a data transfer port for accessing a general-purpose register built in a processor element from the outside of the microprocessor,
  Controlled by an internal sequencer,
  From the global processor included in the SIMD microprocessorPerforms processing in parallel with the SIMD arithmetic processing of the SIMD type microprocessor by automatically operating according to the operation start instruction setting to be performed,
  AboveSelect and read multiple arbitrary data from the data stored in the general-purpose register built in the processor element, and then write the multiple data back to the general-purpose register.
  The interval between the processor elements in which the plurality of data is stored by writing back is larger than the interval between the processor elements in which the plurality of data is stored at the time of reading.
[0016]
  According to a third aspect of the present invention, there is provided a data thinning device according to the present invention.
  In the read data, it is stored in 1 bit at a predetermined position.dataOr stored in multiple bits at a predetermined positiondataSelect
  When writing back to a general-purpose register, a plurality of sets of the above-mentioned items are stored on one general-purpose register.dataStore
A data thinning device according to claim 1.
[0017]
The data thinning-out device according to claim 4 according to the present invention,
Read predetermined data stored in a general-purpose register built in the processor element located near the first end of the column constituted by the entire processor element, which is data other than the arbitrary data to be selected,
The data is written back to a general-purpose register built in the processor element located near the second end of the column constituted by the entire processor element,
at the same time,
Read predetermined data stored in a general-purpose register built in the processor element located near the second end of the column constituted by the entire processor element, other than the arbitrary data to be selected,
The data is written back to a general-purpose register built in the processor element located near the first end of the column constituted by the entire processor element.
A data thinning device according to claim 1.
[0018]
  A data enlarging apparatus according to claim 5 according to the present invention includes:
  1 bit of read datadataOr multiple bitsdataSplit into
  Multiple sets formed by the above divisionAbove dataIs written back to a general-purpose register built into another processor element,
A data enlarging apparatus according to claim 2.
[0019]
A data enlarging device according to claim 6 according to the present invention comprises:
In a first processor element in which data is stored by writing back and a second processor element in which data is stored by writing back,
One or a plurality of consecutive third processor elements exist between the first processor element and the second processor element, and no data is written back to these third processor elements. When
Data written back to the first processor element or data written back to the second processor element is
It is copied and written in a general-purpose register built in the third processor element.
A data enlarging apparatus according to claim 2.
[0020]
  The data thinning-out device according to claim 7 according to the present invention is provided.
  Including a memory unit having at least a bit capacity up to the number of bits, which is equal to the number indicating the number of minimum structural units constituting the data to be reduced included in the data to be reduced,
  Corresponding the data to be thinned out in each minimum constituent unit constituting the data to be reduced with the bit position of the memory unit, storing predetermined data in the corresponding bit position,
  To the stored dataOn the basis of,
The above read processing from the data stored in the general purpose register built in the processor element and the subsequent write back processing to the general purpose register are performed.
A data thinning device according to claim 1.
[0021]
  A data enlarging apparatus according to claim 8 according to the present invention includes:
  Including a memory unit having at least a bit capacity up to the number of bits, which is equal to the number indicating the number of minimum structural units constituting the data after the enlargement process included in the data after the enlargement process,
  Corresponding the expansion control target in each minimum constituent unit constituting the data after the expansion processing and the bit position of the memory unit, storing predetermined data in the corresponding bit position,
  The stored predetermined dataOn the basis of,
The above read processing from the data stored in the general purpose register built in the processor element and the subsequent write back processing to the general purpose register are performed.
A data enlarging apparatus according to claim 2.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.
[0023]
FIG. 1 and FIG. 21 are block diagrams showing a schematic configuration of a SIMD type microprocessor 2 according to the present invention. The configuration shown in FIGS. 1 and 21 is a basic configuration of the SIMD type microprocessor 2 according to the first to eighth embodiments which will be described later.
[0024]
The SIMD type microprocessor 2 shown in FIG. 1 generally includes a global processor 4, a register file 6, an operation array 8, and a thinning-out device 10 or an enlargement device 12.
[0025]
(1) Global processor 4
The global processor 4 itself is a so-called SISD (Single Instruction Stream, Single Data Stream) type processor, which includes a program RAM and a data RAM, decodes the program, and generates various control signals. This control signal is supplied to the register file 6 and the arithmetic array 8 in addition to various built-in blocks. When a GP (global processor) instruction is executed, various arithmetic processes and program control processes are performed using a built-in general-purpose register, an ALU (arithmetic logic unit), and the like.
[0026]
(2) Register file 6
It holds data processed by PE (processor element) instructions. As is well known, the PE (processor element) 3 is a structural unit that executes individual operations in a single instruction-stream (SIMD) type processor. As shown in the register file 6 and the operation array 8 in FIG. 21, the SIMD type microprocessor 2 in FIG. 21 includes 256 PE3. The PE instruction is a SIMD type instruction, and simultaneously performs the same processing on a plurality of data held in the register file 6. Control of reading / writing of data from the register file 6 is performed by a control signal from the global processor 4. The read data is sent to the arithmetic array 8 (arithmetic unit), and is written into the register file after arithmetic processing in the arithmetic array 8 (arithmetic unit).
[0027]
In the register file 6 of FIG. 21, 32 8-bit registers 34 are built in one PE unit, and a set of 256 PEs (32) has an array configuration. The register 34 is called R0, R1, R2,... R31 for each PE.
[0028]
Further, the register file 6 can be accessed from the outside of the processor 2, and reading / writing to a specific register is performed from the outside separately from the control of the global processor 4.
[0029]
(3) Arithmetic array
Processing of PE instruction is performed. All processes are controlled from the global processor 4.
[0030]
A 7 to 1 (7 to 1) multiplexer 42 is placed at a connection portion between the register file 6 and the operation array 8. As shown in FIGS. 21 and 4, when viewed from a certain multiplexer 42, the data in the R0-R31 registers 34 included in the three PEs 3 in the left direction and the data in the R0-R31 registers 34 included in the three PEs 3 in the right direction. The data and the data in the R0 to R31 register 34 included in the PE 3 to which the data belongs are set so that they can be selected as a calculation target.
[0031]
Each PE3 is given a serial number called a PE number. In the SIMD type microprocessor 2 in FIG. 21, since the number of PEs is 256, there are 256 bit sequences of 8 bits (that is, 256 types from 00000000b to 11111111b. Here, “b” at the end is binary. Is expressed as PE number data to each PE 3. The PE number may be given to each PE 3 regardless of the position of the PE, but in this specification, it is assumed that the PE numbers are assigned in order from the (right) end. Also, the PE whose PE number is “n” is
・ PE [n]
It will be expressed as Therefore, the SIMD type microprocessor 2 of FIG. 21 is configured by PE [0], PE [2], PE [3],... PE [254], PE [255] from the right.
[0032]
(4) Decimator 10 or enlarger 12
Arbitrary data is selected from the data read from the register file 6 and written to the register file 6. In the case of writing, normally, writing is performed from the lower number PE3. For example, when 1/4 thinning-out processing is performed in 256 PE3, specific processing in PE [0], PE [4], PE [8], PE [12], PE [16]. The data in the register 34 is read out, and the data is written in PE [0], PE [1], PE [2], PE [3], PE [4]. The “thinning process” will be described in detail later.
[0033]
<< First Embodiment >>
FIG. 2 is a block diagram of the SIMD type microprocessor 2 according to the first embodiment of the present invention.
[0034]
In FIG. 2, the thinning device 10 is connected to the external interface 14 of the register file 6 of the processor 2. In the external interface 14 of the register file 6, when the address, CLK, and R / W signals are given from the outside of the processor 2, the register 34 of the PE 3 indicated by the address can be accessed. The data is read and data is written to the register 34 by the WriteData bus 18.
[0035]
The decimation unit 10 includes an address generation circuit 20 that generates an address of a PE to be accessed, a data buffer 22 that holds read data for a period until writing, and a sequencer 24 that controls them. Operation setting is performed from (GP) 4.
[0036]
In the above configuration, the 1/4 thinning-out operation in the SIMD type microprocessor 2 having 256 processor elements 3 is as follows.
[0037]
(1) Read the register data of PE [0] (address 0) and store it in the data buffer 22.
(2) Write the data in the data buffer 22 to the register of PE [0] (address 0).
(3) The data in the register of PE [4] (address 4) is read and stored in the data buffer 22.
(4) Write the data in the data buffer 22 to the register of PE [1] (address 1).
(5) The data in the register of PE [8] (address 8) is read and stored in the data buffer 22.
(6) Write the data in the data buffer 22 to the register of PE [2] (address 2).
(7) The register data of PE [12] (address 12) is read and stored in the data buffer 22.
(8) Write the data in the data buffer 22 to the register of PE [3] (address 3).
・・・・ (Since the same operation is repeated, it will be omitted.)
(123) The data in the register of PE [244] (address 244) is read and stored in the data buffer 22.
(124) The data in the data buffer 22 is written to the register of PE [61] (address 61).
(125) The data of the register of PE [248] (address 248) is read and stored in the data buffer 22.
(126) Write the data in the data buffer 22 to the register of PE [62] (address 62).
(127) The data in the register of PE [252] (address 252) is read and stored in the data buffer 22.
(128) Write the data in the data buffer 22 to the register of PE [63] (address 63).
[0038]
Note that the first read / write operation from PE [0] in the above (1) and (2)} can be omitted because there is no conversion of data.
[0039]
By the above thinning-out operation, the data of every four PE registers is arranged in the PE [0] direction, and as a result, the registers of PE [0] to PE [63] are 1/4. Thus, the data thinned out in the above will be obtained. This process requires 128 steps. However, since the thinning-out device 10 automatically operates after an instruction to start from GP4, SIMD arithmetic processing by an instruction of the processor 2 can be executed in parallel.
[0040]
<< Second Embodiment >>
FIG. 3 is a block diagram of the SIMD type microprocessor 2 according to the second embodiment of the present invention.
[0041]
In FIG. 3, the expander 12 is connected to the external interface 14 of the register file 6 of the processor 2. The expander 12 includes an address generation circuit 20 that generates an address of the PE 3 to be accessed, a data buffer 22 that holds the read data for a period until writing, and a sequencer 24 that controls them. Operation setting is performed from (GP) 4.
[0042]
In the above configuration, the 4 times enlargement operation in the SIMD type microprocessor 2 having 256 processor elements 3 is as follows.
[0043]
(1) The data in the register of PE [63] (address 63) is read and stored in the data buffer 22.
(2) Write the data in the data buffer 22 to the register of PE [252] (address 252).
(3) The register data of PE [62] (address 62) is read and stored in the data buffer 22.
(4) Write the data in the data buffer 22 to the register of PE [248] (address 248).
(5) The data in the register of PE [61] (address 61) is read and stored in the data buffer 22.
(6) Write the data in the data buffer 22 to the register of PE [244] (address 244).
(7) The register data of PE [60] (address 60) is read and stored in the data buffer 22.
(8) Write the data in the data buffer 22 to the register of PE [240] (address 240).
・・・・ (Since the same operation is repeated, it will be omitted.)
(123) The data in the register of PE [2] (address 2) is read and stored in the data buffer 22.
(124) The data in the data buffer 22 is written to the register of PE [8] (address 8).
(125) The data in the register of PE [1] (address 1) is read and stored in the data buffer 22.
(126) Write the data in the data buffer 22 to the register of PE [4] (address 4).
(127) The register data of PE [0] (address 0) is read and stored in the data buffer 22.
(128) Write the data in the data buffer 22 to the register of PE [0] (address 0).
[0044]
Note that the read / write operation from PE [0] at the last {(127) and (128)} can be omitted because there is no conversion of data.
[0045]
By the above expansion operation, the register data of PE [0] to PE [63] is arranged in the register every four PEs at intervals from PE [0]. Thus, the data enlarged to the above will be obtained. This process requires 128 steps. However, since the expander 12 automatically operates after an instruction to start from GP4, SIMD arithmetic processing by an instruction of the processor 2 can be executed in parallel.
[0046]
<< Third Embodiment >>
In the blocking process of the area determination process described in the related art, the data held in units of blocks is often data indicating the result of a predetermined determination for each block. In addition, normally, the data indicates only “match / mismatch” regarding the predetermined determination. Such data can be expressed as 1-bit data. Therefore, data held in units of blocks is often only 1 bit.
[0047]
As shown in the block diagram of FIG. 4 (and FIG. 21), in the SIMD type microprocessor 2 according to the present invention, when viewed from a certain PE3, the data in the R0-R31 register 34 included in the PE3 to which it belongs and the left and right Data of the R0 to R31 registers 34 included in the three PEs 3 in the respective directions are set so that they can be selected as calculation targets. Therefore, in the SIMD type microprocessor 2 according to the present invention, it is possible to process data of 7 pixels on the left and right with a single pixel as a center (including itself) by one access. If data access is attempted, a shift operation or the like in the line (main scanning) direction is required.
[0048]
By the way, in the area determination process described above, a wide range of data may be required, and for example, about 32 blocks of block data may be required. If an attempt is made to configure settings that allow access to up to 32 blocks in a single operation, a multiplexer is required to select one from 32 targets. This increases the cost associated with the hardware. On the other hand, when a configuration in which the shift operation is repeated in the line direction is formed, a considerably large number of steps are required, and the performance may be deteriorated. Therefore, in the case of data composed of only one bit, such as the block data, it is assumed that the above-described problem can be solved by storing block data for a plurality of PEs 3 in a single PE 3 register. Can be done.
[0049]
In the SIMD type microprocessor 2 according to the present invention, 8-bit data can be stored in one register (for example, R0). Therefore, if data is stored as shown in FIG. 5, the left and right 35 blocks can be handled in one access. In FIG. 5, bit5, bit6, and bit7 are treated as invalid data (5 × 1 bit pack processing).
[0050]
If 2 bits of block data are required, it is possible to divide one register into 2 bits and store data for a plurality of PEs. In the SIMD type microprocessor 2 according to the present invention, since the register is 8 bits, up to 4 pieces of 2-bit block data can be stored in one PE3 (register). In that case, the left and right 28 blocks can be handled in one access (4 × 2 bit pack processing).
[0051]
FIG. 6 is a block diagram of a SIMD type microprocessor 2 according to the third embodiment of the present invention.
[0052]
With respect to the SIMD type microprocessor 2 of the first embodiment shown in FIG. 2, a barrel shifter 26 is added for writing to the data buffer 22, and data at an arbitrary bit position of the register 34 is transferred to the data buffer 22. It can be written in any bit.
[0053]
In the above configuration, the operation of “¼ thinning out” and “5 × 1 bit pack” in the SIMD type microprocessor 2 having 256 processor elements 3 is as follows. Here, “bit 0 of the register” of PE indicates the least significant bit of a predetermined register. Similarly, “bit 1”, “bit 2”, “bit 3”, and “bit 4” indicate the higher-order bits one by one as viewed from bit 0.
[0054]
(1) Read bit0 of the register of PE [0] and store it in bit0 of the data buffer 22.
(2) Read bit0 of the register of PE [4] and store it in bit1 of the data buffer 22.
(3) Read bit0 of the register of PE [8] and store it in bit2 of the data buffer 22.
(4) Read bit0 of the register of PE [12] and store it in bit3 of the data buffer 22.
(5) Read bit0 of the register of PE [16] and store it in bit4 of the data buffer 22.
(6) Write the data in the data buffer 22 to the register of PE [0].
(7) Read bit0 of the register of PE [20] and store it in bit0 of the data buffer 22.
(8) Read bit0 of the register of PE [24] and store it in bit1 of the data buffer 22.
(9) Read bit0 of the register of PE [28] and store it in bit2 of the data buffer 22.
(10) Read bit0 of the register of PE [32] and store it in bit3 of the data buffer 22.
(11) Read bit0 of the register of PE [36] and store it in bit4 of the data buffer 22.
(12) Write the data in the data buffer 22 to the register of PE [1].
(13) Read bit0 of the register of PE [40] and store it in bit0 of the data buffer 22.
(14) Read bit0 of the register of PE [44] and store it in bit1 of the data buffer 22.
(15) Read bit0 of the register of PE [48] and store it in bit2 of the data buffer 22.
(16) Read bit0 of the register of PE [52] and store it in bit3 of the data buffer 22.
(17) Read bit0 of the register of PE [56] and store it in bit4 of the data buffer 22.
(18) Write the data in the data buffer 22 to the register of PE [2].
・・・・ (Since the same operation is repeated, it will be omitted.)
(73) Read bit0 of the register of PE [240] and store it in bit0 of the data buffer 22.
(74) Read bit0 of the register of PE [244] and store it in bit1 of the data buffer 22.
(75) Read bit0 of the register of PE [248] and store it in bit2 of the data buffer 22.
(76) Read bit0 of the register of PE [252] and store it in bit3 of the data buffer 22.
(77) Write the data in the data buffer 22 to the register of PE [12].
[0055]
The data to be stored in the last data {register of PE [12]} is 4-bit data. In the above operation, the arrangement of data in the registers after processing is different from that shown in FIG. For example, the bit 0 data of the register of PE [0] is eventually stored in “bit 0” (in FIG. 5, it is stored in bit 2).
[0056]
<< Fourth Embodiment >>
When the number of pixels (one line, that is, in the main scanning direction) of the target image for image processing is larger than the number of PEs of the SIMD type microprocessor 2, the processing is performed by dividing the number of PEs. The pixel data to be divided is stored, for example, using a plurality of registers (rows) as shown in FIG. 7 (here, the target pixel is 1024 and the number of PEs is 256).
[0057]
As described above, in the SIMD type microprocessor 2 according to the present invention, in the operation unit of one PE3, up to three registers on the left side of the PE3 and up to three on the right side of the PE3. It is set so that the registers provided in PE3 can be accessed. For example, when performing a predetermined filtering process on image data, data on several pixels on the left and right (in the main scanning direction) from a certain pixel may be required. The above setting is for preparing for such processing.
[0058]
However, when dividing pixel data, pixel data to be accessed at and near the division boundary can be obtained even if processing that requires access to the left and right pixel data as described above is performed. The problem of not being able to occur may occur. This will be specifically described.
[0059]
A case is assumed where filter processing is performed using three right pixels and three left pixels. At this time, it is assumed that the pixel data is stored in the register 34 as shown in FIG. The PE [255] that stores the “pixel 255” in the R0 register can access the “pixel 254”, the “pixel 253”, and the “pixel 252” that are the three pixels on the right side without any problem. However, since “pixel 256”, “pixel 257”, and “pixel 258”, which are the left three pixels, are stored in the R1 registers of PE [0], PE [1], and PE [2], PE [255 ] Cannot be accessed immediately.
[0060]
As a countermeasure for the above-described problem, pixel data is overlapped in the PE near the end, which is the boundary of division, for example, PE near PE [0] or PE near 255 [PE]. Is stored.
[0061]
In FIG. 8, pixel data is stored with an overlap of 4 pixels (data) in the vicinity of both ends. That is, the pixels 248 to 251 stored in the R0 registers of PE [252] to PE [255] are used as “reference pixels” from the PEs up to the pixel 247 (PE [247]), and PE [0]. The pixels 244 to 247 stored in the R1 registers of .about.PE [3] are used as “reference pixels” from the PEs after the pixel 248 (PE [248]). These “overlapping parts” are finally deleted and output as unnecessary data after the image processing is completed. Further, “dummy” on the drawing is unnecessary data placed at both ends of the image data for 1024 pixels, and is finally deleted and output. The size of such an “overlapping portion” is determined by how much the number of reference pixels in the left-right direction is necessary in the entire image processing. Usually, it is about 16 pixels.
[0062]
However, as described above, since the pixel data of the “overlapping portion” is stored, the effective number of PEs is reduced. This is because the pixel data of the “overlapping portion” is provided only for reference. Therefore, the number of divisions increases and the number of registers to be stored increases. In FIG. 8, the R4 register is also used for data storage.
[0063]
When the image data is divided as shown in FIG. 8, it is as shown in FIG. 10 after the thinning process. FIG. 10 shows the result of performing the process of thinning out 1/4 to four blocks in the main scanning direction as one block. In the figure, pixels 0-3 are block 0, pixels 4-7 are block 4, pixels 8-11 are block 8,. . . In this way, the name of the block is given.
[0064]
First, in the situation of FIG. 8, the pixel data included in the block, for example, the pixel values of the pixels 0 to 3 are compared with a certain pattern value to “1”, and if they do not match, the block data is set to “0”. Is stored in the register of the PE (in the example, PE [0], PE [4], PE [8],...) Related to the top pixel of the block. As a result, the block data is arranged in multiples of 4 as shown in FIG. Subsequently, when thinned to ¼ as in the first embodiment, they are arranged in the direction of PE [0] as shown in FIG.
[0065]
By the way, when the processing related to the area determination is performed using the block data after the thinning processing as shown in FIG. 10, it is sometimes necessary to refer to the register (data stored in the PE3) in the left and right direction. . For example, in FIG. 10, it is assumed that the reference block data requires up to three left and right blocks (three PEs). When attention is paid to the data of the block 244 stored in the register R0 of the PE [62] in FIG. 10, there is the reference block data in the right direction. However, only the block 248 of PE [63] exists regarding the reference block data in the left direction. Similarly, the block 248 data stored in the register R1 of PE [1] has left-direction reference block data, but the right-direction reference block data has only PE [0] block 244. Therefore, the process cannot be continued in this state.
[0066]
As one of the solutions to the above problem, it can be considered that the overlapped portion of the pixel data before thinning in FIG. However, in this case, the “overlapping portion” needs to be four times as large as that in FIG. 8, and the effective number of PEs is further reduced.
[0067]
The SIMD type microprocessor 2 according to the fourth embodiment of the present invention solves these problems.
[0068]
FIG. 12 is a block diagram of a SIMD type microprocessor 2 according to the fourth embodiment of the present invention. The SIMD microprocessor 2 according to the first embodiment of the present invention shown in FIG. 2 has substantially the same configuration, but in the fourth embodiment, there are five buffers in the data buffer 22. (Buf0, buf1, buf2, buf3, buf4).
[0069]
In the above configuration, the 1/4 thinning-out operation in the SIMD type microprocessor 2 having 256 processor elements 3 is as follows.
[0070]
First, before thinning processing, block data is arranged in advance as shown in FIG. Necessary block data is stored in a PE (register) whose PE number is “a multiple of 4”, as in FIG. 9. Furthermore, in the PE (register) whose PE number is “multiple of 4−1”, block data handled in the previous division of each piece of divided image data formed by dividing the image data is the PE number. It is shifted and stored in the right direction. Similarly, in the PE (register) whose PE number is “multiple of 4 + 1”, block data handled in the next division of each piece of divided image data formed by dividing the image data is the PE number. It is shifted to the left and stored.
[0071]
In FIG. 11, for example, attention is focused on the R1 register. In the PE whose PE number is a multiple of 4, block data obtained by processing the data stored in the R1 register (division 1) is stored. That is, as described above, assuming that four pixels in the main scanning direction are one block, a predetermined operation is performed on the pixel data included in each block, and the result value is the R1 register of the PE related to the first pixel of the block. Is stored.
[0072]
Similarly, block data obtained by processing data stored in the R0 register (division 0) is stored in the PE whose PE number is “multiple of 4−1”. That is, in “division 0”, four pixels in the main scanning direction are set as one block, and a predetermined calculation is performed on pixel data included in each block, and the result value is stored in the R1 register of the PE of “multiple of 4−1” ( It is stored by shifting one PE to the right). Further, the PE whose PE number is “multiple of 4 + 1” stores block data obtained by processing the data stored in the R2 register (division 2). In other words, in “division 2”, four pixels in the main scanning direction are set as one block, and a predetermined calculation is performed on pixel data included in each block, and the result value is stored in the PE R1 register of “multiple of 4 + 1” (PE It is stored by shifting it to the left by one).
[0073]
The thinning-out device 10 performs the following operation on the block data arranged as shown in FIG.
[0074]
(1) The register data of PE [235] is read and stored in buf0 of the data buffer 22.
(2) Write the data of buf0 of the data buffer 22 to the register of PE [0].
(3) The data in the register of PE [239] is read and stored in buf0 of the data buffer 22.
(4) Write the data of buf0 of the data buffer 22 to the register of PE [1].
(5) Read the register data of PE [243] and store it in buf0 of the data buffer 22.
(6) Write buf0 data of the data buffer 22 to the register of PE [2].
(7) The register data of PE [247] is read and stored in buf0 of the data buffer 22.
(8) Write the data of buf0 of the data buffer 22 to the register of PE [3].
(9) The register data of PE [5] is read and stored in buf1 of the data buffer 22.
(10) The data in the register of PE [9] is read and stored in buf2 of the data buffer 22.
(11) Read the data in the register of PE [13] and store it in buf3 of the data buffer 22.
(12) The register data of PE [17] is read and stored in buf4 of the data buffer 22.
(13) The data in the register of PE [4] is read and stored in buf0 of the data buffer 22.
(14) The buf0 data of the data buffer 22 is written to the register of PE [4].
(15) The data in the register of PE [8] is read and stored in buf0 of the data buffer 22.
(16) Write the buf0 data of the data buffer 22 to the register of PE [5].
(17) The data in the register of PE [12] is read and stored in buf0 of the data buffer 22.
(18) Write the buf0 data of the data buffer 22 to the register of PE [6].
(19) The data in the register of PE [16] is read and stored in buf0 of the data buffer 22.
(20) Write the buf0 data of the data buffer 22 to the register of PE [7].
・・・・ (Since the same operation is repeated, it will be omitted.)
(131) The register data of PE [240] is read and stored in buf0 of the data buffer 22.
(132) The buf0 data of the data buffer 22 is written to the register of PE [63].
(133) The data in the register of PE [244] is read and stored in buf0 of the data buffer 22.
(134) The buf0 data of the data buffer 22 is written to the register of PE [64].
(135) The data in the register of PE [248] is read and stored in buf0 of the data buffer 22.
(136) The data of buf0 of the data buffer 22 is written to the register of PE [65].
(137) Write the data of buf1 of the data buffer 22 to the register of PE [66].
(138) The buf2 data of the data buffer 22 is written into the register of PE [67].
(139) The data of buf3 of the data buffer 22 is written to the register of PE [68].
(140) The buf4 data of the data buffer 22 is written into the register of PE [69].
[0075]
With the above operation, the block data after the thinning is arranged as shown in FIG. Unlike FIG. 10, the data of the block 244 stored in the R0 register of PE [65] has the reference block data in both the left and right directions, and is similarly stored in the R1 register of PE [4]. In the data of the block 248, the reference block data exists in both the left and right directions.
[0076]
In the third embodiment, the number of data buffers 22 is five. This is to prevent the data in the registers of PE [5], PE [9], PE [13], and PE [17] from being overwritten by data that has been reduced to 1/4 after decimation and disappearing. . This (that is, prevention of disappearance) can be realized also in the configuration of the first embodiment shown in FIG. That is, the data of the registers of PE [5], PE [9], PE [13], and PE [17] are temporarily stored in PE registers that are not used for other purposes in the above processing, for example, PE [235], PE [239]. ], PE [243], and PE [247] are saved in the registers, and can be realized only by changing the sequence of reading from them when writing to PE [66] to PE [69].
[0077]
Even in the thinning-out process by the bit pack shown in the third embodiment, when the image data is divided by the number of PEs, in order to avoid the problem that the reference data at the division boundary is missing, Processing similar to that of the fourth embodiment may be required.
[0078]
<< Fifth Embodiment >>
After the area determination process using the block data is performed, the image processing content is determined for each pixel based on the determination result. Therefore, the area determination result obtained from the data thinned out to ¼ (etc.) must be expanded four times (etc.) and returned to the original pixel position. If the SIMD type microprocessor 2 of the first embodiment thins it out to ¼, if it is enlarged by the SIMD type microprocessor 2 of the second embodiment, the determination result data is returned to the corresponding original pixel position. You can go back up. However, since the bit pack is formed in the third embodiment, it is necessary to use another embodiment to restore the original (that is, restore the bit pack or unpack the bit pack).
[0079]
FIG. 14 is a block diagram of a SIMD type microprocessor 2 according to the fifth embodiment of the present invention.
[0080]
A barrel shifter 26 ′ is added to read from the data buffer 22 with respect to the SIMD type microprocessor 2 of the second embodiment shown in FIG. 3, and data at an arbitrary bit position in the data buffer 22 is stored in the register. It is configured to be able to write to 34 arbitrary bits.
[0081]
In the above configuration, the operation of “4 times expansion” and “5 × 1 bit unpacking” in the SIMD type microprocessor 2 having 256 processor elements 3 is as follows. Here, “bit 0 of the data buffer” indicates the least significant bit of the predetermined data buffer 22. Similarly, “bit 1”, “bit 2”, “bit 3”, and “bit 4” indicate the higher-order bits one by one as viewed from bit 0.
[0082]
(1) The data in the register of PE [12] is read and stored in the data buffer 22.
(2) Write bit3 data of the data buffer 22 to the register of PE [252].
(3) Write the bit2 data of the data buffer 22 to the register of PE [248].
(4) Write bit1 data of the data buffer 22 to the register of PE [244].
(5) Write the data of bit 0 of the data buffer 22 to the register of PE [240].
(6) The data in the register of PE [11] is read and stored in the data buffer 22.
(7) Write the data of bit 4 of the data buffer 22 to the register of PE [236].
(8) Write bit3 data of the data buffer 22 to the register of PE [232].
(9) Write the bit2 data of the data buffer 22 to the register of PE [228].
(10) Write the data of bit1 of the data buffer 22 to the register of PE [224].
(11) Write the data of bit 0 of the data buffer 22 to the register of PE [220].
・・・・ (Since the same operation is repeated, it will be omitted.)
(72) The register data of PE [0] is read and stored in the data buffer 22.
(73) The bit4 data of the data buffer 22 is written to the register of PE [16].
(74) The bit3 data of the data buffer 22 is written to the register of PE [12].
(75) Write the bit2 data of the data buffer 22 to the register of PE [8].
(76) Write the bit1 data of the data buffer 22 to the register of PE [4].
(77) The data of bit 0 of the data buffer 22 is written to the register of PE [0].
[0083]
Note that the last write operation to {[77)} PE [0] can be omitted because there is no conversion of data.
[0084]
<< Sixth Embodiment >>
After the area determination process using the block data is performed, the image processing content is determined for each pixel based on the determination result. Therefore, the area determination result obtained from the data thinned out to ¼ (etc.) must be expanded four times (etc.) and returned to the original pixel position. If the SIMD type microprocessor 2 of the first embodiment thins it out to ¼, if it is enlarged by the SIMD type microprocessor 2 of the second embodiment, the determination result data is returned to the corresponding original pixel position. You can go back up.
[0085]
As described in the description of the fifth embodiment, when the area determination result data is expanded four times (etc.) using the SIMD type microprocessor 2 of the second embodiment, the PE number is 4. Valid region determination result data is arranged in PE that is a multiple of. FIG. 15 shows such a state.
[0086]
In the PE with a PE number other than a multiple of 4 in the process after expansion, the area determination result data stored in the register of the PE with the PE number less than that PE number and the nearest multiple of 4 is used. It will be. That is, PE [0] to PE [3] are area determination result data on the register of PE [0], PE [4] to PE [7] are area determination result data on the register of PE [4], and PE [4] 8] to PE [11] use the area determination result data on the register of PE [8],. In other words, a PE whose PE number is “multiple of 4” is data on its own register, a PE whose PE number is “multiple of 4 + 1” is data on the register of the right PE, and its PE number is “4”. The PE of “multiple +2” refers to the data on the register of the right two PEs, and the PE of PE number “multiple of 4 + 3” refers to the data on the registers of the right three PEs.
[0087]
As described above, the processing is different for each PE. Therefore, the feature of the SIMD type microprocessor that executes the same processing (parallel processing) on all PEs cannot be used well.
[0088]
The SIMD type microprocessor 2 shown in the block diagram of FIG. 3 is according to the sixth embodiment of the present invention as well as according to the second embodiment of the present invention. However, in the sixth embodiment, the processing sequence is different from that in the second embodiment.
[0089]
In the above configuration, the 4 times enlargement operation in the SIMD type microprocessor 2 having 256 processor elements 3 is as follows.
[0090]
(1) The data in the register of PE [63] (address 63) is read and stored in the data buffer 22.
(2) Write the data in the data buffer 22 to the register of PE [255] (address 255).
(3) Write the data in the data buffer 22 to the register of PE [254] (address 254).
(4) Write the data in the data buffer 22 to the register of PE [253] (address 253).
(5) Write the data in the data buffer 22 to the register of PE [252] (address 252).
(6) The data in the register of PE [62] (address 62) is read and stored in the data buffer 22.
(7) Write the data in the data buffer 22 to the register of PE [251] (address 251).
(8) Write the data in the data buffer 22 to the register of PE [250] (address 250).
(9) Write the data in the data buffer 22 to the register of PE [249] (address 249).
(10) Write the data in the data buffer 22 to the register of PE [248] (address 248).
(11) The data in the register of PE [61] (address 61) is read and stored in the data buffer 22.
(12) Write the data in the data buffer 22 to the register of PE [247] (address 247).
(13) Write the data in the data buffer 22 to the register of PE [246] (address 246).
(14) Write the data in the data buffer 22 to the register of PE [245] (address 245).
(15) Write the data in the data buffer 22 to the register of PE [244] (address 244).
・・・・ (Since the same operation is repeated, it will be omitted.)
(311) The register data of PE [1] (address 1) is read and stored in the data buffer 22.
(312) The data in the data buffer 22 is written to the register of PE [7] (address 7).
(313) Write the data in the data buffer 22 to the register of PE [6] (address 6).
(314) Write the data in the data buffer 22 to the register of PE [5] (address 5).
(315) Write the data in the data buffer 22 to the register of PE [4] (address 4).
(316) The data in the register of PE [0] (address 0) is read and stored in the data buffer 22.
(317) Write the data in the data buffer 22 to the register of PE [3] (address 3).
(318) Write the data in the data buffer 22 to the register of PE [2] (address 2).
(319) Write the data in the data buffer 22 to the register of PE [1] (address 1).
(320) Write the data in the data buffer 22 to the register of PE [0] (address 0).
[0091]
Note that the last write operation to {[320)} PE [0] can be omitted because there is no conversion of data.
[0092]
The data after enlargement by the above operation is as shown in FIG. Therefore, in all PEs, data stored in a register provided in the PE itself may be referred to as a region determination result.
[0093]
<< Seventh Embodiment >>
In the configuration of the first embodiment described above, the unit of “decimation” is about a power of 2 (2, 4, 8, 16,...). Therefore, it is an effective configuration in processing such as blocking. However, in the process of reducing the image data, the “thinning” interval is not constant. That is, it is desirable to provide a means by which the “thinning” interval can be arbitrarily designated.
[0094]
FIG. 17 is a block diagram of the SIMD type microprocessor 2 according to the seventh embodiment of the present invention. Although the configuration is basically the same as that of the first embodiment of FIG. 2, the sequencer 24 is provided with a RAM 30 for designating a thinning interval. The capacity of the RAM 30 is such that at least one bit of data per unit can be provided up to the number of main scanning pixels of the image data to be reduced.
[0095]
In the above configuration, the reduction operation is performed as follows.
[0096]
(1) The read pointer and write pointer of the address generation circuit 20 are set to “0”. At the same time, the address pointer of the RAM 30 is set to “0”.
(2) Read the register data of the PE whose address is the value of the read pointer and store it in the data buffer 22. At the same time, data in the RAM 30 is read from the address specified by the address pointer of the RAM 30, and the read pointer and the RAM address pointer are incremented by “1”.
(3) If the read data in the RAM 30 is “1”, the data in the data buffer 22 is written to the PE register whose address is the write pointer, and the write pointer is incremented by “1”.
(4) The above operations (2) and (3) are repeated until the read pointer exceeds the maximum number of PEs.
[0097]
Through the above operation, pixels corresponding to locations (addresses) where “1” is not stored in the RAM 30 (that is, locations (address) where “0” is stored) are thinned, and “1” is stored in the RAM 30. Pixels corresponding to the stored locations (addresses) are packed in the PE [0] direction, and data is arranged in the register.
[0098]
Even when the number of pixels of the image data to be reduced is larger than the number of PEs and the image data is to be divided and processed, the RAM address pointer of (1) above is cleared by the reduction processing of the second and subsequent divided image data. The processing can be continued without performing the above. In the reduction processing of the second and subsequent divided image data, the following processing is performed.
[0099]
(1) The read pointer and write pointer of the address generation circuit 20 are set to “0”.
(2) Read the register data of the PE whose address is the value of the read pointer and store it in the data buffer 22. At the same time, data in the RAM 30 is read from the address specified by the address pointer of the RAM 30, and the read pointer and the RAM address pointer are incremented by “1”.
(3) If the read data in the RAM 30 is “1”, the data in the data buffer 22 is written to the PE register whose address is the write pointer, and the write pointer is incremented by “1”.
(4) The above operations (2) and (3) are repeated until the read pointer exceeds the maximum number of PEs.
[0100]
In the above operation, there is a problem that when the image data is divided, the reduced image data is thinned out (packed) and arranged on the lower PE side in divided units (upper order). PE will be lined with “dummy” data). Further, the reduced image data has a problem that the reference pixel data as described in the above “fourth embodiment” is not arranged. Therefore, as shown in FIG. 18, the read side register 6 ′ (image before reduction) and the write side register (image after reduction) 6 ″ are connected separately. In the form of FIG. 18, image data is only in the read side register. Is further divided and stored, and the reduction operation is continued by advancing only the read pointer to the second and subsequent image data, whereby the next PE after the final end of the reduced image data of the divided image. In this case, even if the read pointer is advanced to the second and subsequent image data, the write pointer is not cleared.
[0101]
<< Eighth Embodiment >>
In the configuration of the second embodiment described above, the unit of “enlargement” is about 2 to the power (2, 4, 8, 16,...). Therefore, this is an effective configuration in enlargement processing for reflecting the block area determination data to the current image data. However, the interval of “enlargement” is not constant in processing such as enlargement of image data. That is, it is desirable to provide a means by which the “enlargement” interval can be arbitrarily designated.
[0102]
FIG. 19 is a block diagram of the SIMD type microprocessor 2 according to the eighth embodiment of the present invention. Although the configuration is basically the same as that of the second embodiment in FIG. 3, the sequencer 24 is provided with a RAM 30 for designating an expansion interval. The capacity of the RAM 30 is such that at least one bit of data per unit can be provided up to the number of main scanning pixels of the enlarged image data.
[0103]
In the above configuration, the enlargement operation is performed as follows.
[0104]
(1) The read pointer of the address generation circuit 20 is set to the main scanning pixel value of the image data before enlargement, and the write pointer is set to the main scanning pixel value of the image data after enlargement. At the same time, the address pointer of the RAM 30 is set to the main scanning pixel value of the enlarged image data.
(2) Read the register data of the PE whose address is the value of the read pointer and store it in the data buffer 22. At the same time, data in the RAM 30 is read from the address specified by the address pointer of the RAM 30, and the address pointer of the RAM 30 is decremented by “1”. When the read data in the RAM 30 is “1”, the read pointer is decremented by “1”.
(3) Write the data in the data buffer 22 to the PE register whose address is the write pointer. The write pointer is decremented by “1”.
(4) The above operations (2) and (3) are repeated until the write pointer becomes zero.
[0105]
As a result of the above operation, the pixel corresponding to the location (address) where “1” is not stored in the RAM 30 (that is, the location (address) where “0” is stored) is set to ( Before enlargement) pixel data is written, and the pixel data corresponding to the location (address) where “0” is stored in the RAM 30 is written (before enlargement) at the address indicated by the read pointer. In this way, an enlargement operation can be realized.
[0106]
In the above operation, when the image data is divided, there may be a problem that the enlarged image data overwrites the image data before enlargement before they are read out. Therefore, as shown in FIG. 20, the read-side register 6 ′ (pre-enlarged image data) and the write-side register 6 ″ (enlarged image data) are connected separately. By transferring to the divided image data storage register after enlargement, the above-described problem of overwriting can be avoided.
[0107]
(1) The read pointer of the address generation circuit 20 is set to the main scanning pixel value of the image data before enlargement. The write pointer is set to the remainder obtained by dividing the main scanning pixel value of the enlarged image data by the number of PEs. Further, the address pointer of the RAM 30 is set to the main scanning pixel value of the enlarged image data.
(2) Read the register data of the PE whose address is the value of the read pointer and store it in the data buffer 22. At the same time, data in the RAM 30 is read from the address specified by the address pointer of the RAM 30, and the address pointer of the RAM 30 is decremented by “1”. When the read data in the RAM 30 is “1”, the read pointer is decremented by “1”.
(3) Write the data in the data buffer 22 to the PE register whose address is the write pointer. The write pointer is decremented by “1”.
(4) The above operations (2) and (3) are repeated until the write pointer becomes zero.
(5) The data in the post-enlargement register is transferred to the “first division” of the divided image data storage register.
(6) The write pointer is set to the number of PEs (value).
(7) Read the register data of the PE whose address is the value of the read pointer and store it in the data buffer 22. At the same time, data in the RAM 30 is read from the address specified by the address pointer of the RAM 30, and the address pointer of the RAM 30 is decremented by “1”. When the read data in the RAM 30 is “1”, the read pointer is decremented by “1”.
(8) Write the data in the data buffer 22 to the PE register whose address is the write pointer. The write pointer is decremented by “1”.
(9) The above operations (7) and (8) are repeated until the write pointer becomes zero.
(10) Transfer the data in the post-enlargement register to the “next division” of the divided image data storage register.
(11) Repeat (6) to (10) above until the read pointer becomes zero.
[0108]
If the pre-enlargement image data is divided, the contents of the pre-enlargement register can be changed during processing, and the control of each pointer can be changed as appropriate.
[0109]
<< Other embodiments >>
Since the thinning-out devices in the above embodiments are mainly different in the sequence in the sequencer 24, it is naturally possible to easily realize a configuration that combines several embodiments. The same applies to the expander in the above embodiment. Furthermore, for example, it is also possible to configure one device that has all the functions of the first to sixth embodiments.
[0110]
【The invention's effect】
By using the present invention, the following effects can be obtained.
[0111]
When the SIMD type microprocessor according to the first embodiment is used, thinning-out processing in the main scanning direction can be realized in the pixel data with a simple configuration and low cost.
[0112]
When the SIMD type microprocessor according to the second embodiment is used, the enlargement process in the main scanning direction can be realized in the pixel data with a simple configuration and low cost.
[0113]
When the SIMD type microprocessor according to the third embodiment is used, a wider range of reference pixel data can be handled in a single process.
[0114]
When the SIMD type microprocessor according to the fourth embodiment is used, thinning-out processing is performed for image data having a number of pixels exceeding the number of PEs while considering (neighboring) reference data necessary for filter processing or the like. Can be performed.
[0115]
When the SIMD type microprocessor according to the fifth embodiment is used, it is possible to enlarge the bit-packed data.
[0116]
When the SIMD type microprocessor according to the sixth embodiment is used, SIMD arithmetic processing can be performed with few steps after enlargement processing and blocking processing.
[0117]
When the SIMD type microprocessor according to the seventh embodiment is used, image reduction processing can be realized.
[0118]
When the SIMD type microprocessor according to the eighth embodiment is used, image enlargement processing can be realized.
[Brief description of the drawings]
FIG. 1 is a block diagram (1) showing a schematic configuration of a SIMD type microprocessor according to the present invention;
FIG. 2 is a block diagram of a SIMD type microprocessor according to the first embodiment.
FIG. 3 is a block diagram of a SIMD type microprocessor according to a second embodiment.
FIG. 4 is a block diagram illustrating a relationship between a calculation unit and a register.
FIG. 5 is a block diagram illustrating a relationship between a calculation unit and a register in which bit pack data is stored.
FIG. 6 is a block diagram of a SIMD type microprocessor according to a third embodiment.
FIG. 7 is a schematic diagram showing a situation in which pixel data is stored in a register when the number of pixels is larger than the number of PEs of a SIMD type microprocessor.
FIG. 8 is a schematic diagram showing a situation in which pixel data is stored in a register while forming an overlapping portion when the number of pixels is larger than the number of PEs of a SIMD type microprocessor.
FIG. 9 is a schematic diagram illustrating a state in which block data before thinning is stored in a register.
FIG. 10 is a schematic diagram showing a situation in which block data after thinning is stored in a register.
FIG. 11 is a schematic diagram showing a storage state of block data used in the SIMD type microprocessor according to the fourth embodiment.
FIG. 12 is a block diagram of a SIMD type microprocessor according to a fourth embodiment.
FIG. 13 is a schematic diagram illustrating a preferable situation of storing block data after thinning into a register, which is formed by a SIMD type microprocessor according to the fourth embodiment;
FIG. 14 is a block diagram of a SIMD type microprocessor according to a fifth embodiment.
FIG. 15 is a schematic diagram showing a storage state of block data used in the SIMD type microprocessor according to the sixth embodiment.
FIG. 16 is a schematic diagram showing a storage state of block data formed by the SIMD type microprocessor according to the sixth embodiment.
FIG. 17 is a block diagram (1) of a SIMD type microprocessor according to a seventh embodiment;
FIG. 18 is a block diagram (2) of a SIMD type microprocessor according to a seventh embodiment;
FIG. 19 is a block diagram (1) of a SIMD type microprocessor according to an eighth embodiment;
FIG. 20 is a block diagram (2) of a SIMD type microprocessor according to an eighth embodiment;
FIG. 21 is a block diagram (2) showing a schematic configuration of a SIMD type microprocessor according to the present invention;
FIG. 22 is a conceptual diagram of blocking processing.
[Explanation of symbols]
2 ... SIMD type microprocessor,
3 ... Processor element (PE),
4 ... Global processor (GP),
6, 6 ', 6 "... register file,
8: Arithmetic array,
10: Thinning-out device,
12 ... Magnifier,
14, 14 ', 14 "... external interface,
20, 20 ', 20 "... Address generation circuit,
22: Data buffer,
24 ... Sequencer,
26, 26 '... barrel shifter,
30 ... RAM (Random Access Memory),
34: Register.

Claims

複数のプロセッサエレメントを含むＳＩＭＤ型マイクロプロセッサが有する、
プロセッサエレメントが内蔵している汎用レジスタに上記マイクロプロセッサ外部からアクセスするための、データ転送用ポートに接続されているデータ間引き装置であり、
内部に含まれるシーケンサにより制御され、
上記ＳＩＭＤ型マイクロプロセッサに含まれるグローバルプロセッサより行われる動作開始の指示設定に応じて自動的に動作することで上記ＳＩＭＤ型マイクロプロセッサのＳＩＭＤ演算処理と並列して処理を実施し、
前記プロセッサエレメントが内蔵している汎用レジスタに格納されるデータの中から任意のデータを複数選択して読み出し、その後それら複数のデータを汎用レジスタに書き戻し、
書き戻しによりそれら複数のデータが格納されるプロセッサエレメントの間隔が、読み出し時にそれら複数のデータが格納されていたプロセッサエレメントの間隔よりも、小さいことを特徴とする、
データ間引き装置。A SIMD type microprocessor including a plurality of processor elements has ,
A data thinning device connected to a data transfer port for accessing a general-purpose register incorporated in a processor element from the outside of the microprocessor.
Controlled by the sequencer included inside,
The processing is performed in parallel with the SIMD arithmetic processing of the SIMD type microprocessor by automatically operating according to the instruction setting for starting the operation performed by the global processor included in the SIMD type microprocessor.
The reading arbitrary data from among the data stored in the general-purpose register processor elements incorporates a plurality selected, then written back plurality of data to the general-purpose register,
The interval between the processor elements in which the plurality of data is stored by writing back is smaller than the interval between the processor elements in which the plurality of data is stored at the time of reading.
Data thinning device.

複数のプロセッサエレメントを含むＳＩＭＤ型マイクロプロセッサが有する、
プロセッサエレメントが内蔵している汎用レジスタに上記マイクロプロセッサ外部からアクセスするための、データ転送用ポートに接続されているデータ拡大装置であり、
内部に含まれるシーケンサにより制御され、
上記ＳＩＭＤ型マイクロプロセッサに含まれるグローバルプロセッサより行われる動作開始の指示設定に応じて自動的に動作することで上記ＳＩＭＤ型マイクロプロセッサのＳＩＭＤ演算処理と並列して処理を実施し、
前記プロセッサエレメントが内蔵している汎用レジスタに格納されるデータの中から任意のデータを複数選択して読み出し、その後それら複数のデータを汎用レジスタに書き戻し、
書き戻しによりそれら複数のデータが格納されるプロセッサエレメントの間隔が、読み出し時にそれら複数のデータが格納されていたプロセッサエレメントの間隔よりも、大きいことを特徴とする、
データ拡大装置。A SIMD type microprocessor including a plurality of processor elements has ,
A data expansion device connected to a data transfer port for accessing a general-purpose register incorporated in a processor element from outside the microprocessor,
Controlled by the sequencer included inside,
The processing is performed in parallel with the SIMD arithmetic processing of the SIMD microprocessor by automatically operating according to the operation start instruction setting performed by the global processor included in the SIMD microprocessor ,
The reading arbitrary data from among the data stored in the general-purpose register processor elements incorporates a plurality selected, then written back plurality of data to the general-purpose register,
The interval between the processor elements in which the plurality of data is stored by writing back is larger than the interval between the processor elements in which the plurality of data is stored at the time of reading.
Data expansion device.

読み出されたデータにおいて、所定の位置の１ビットに格納されるデータ、若しくは所定の位置の複数ビットに格納されるデータを、選択し、
汎用レジスタに書き戻す際には、１つの汎用レジスタ上に、複数組の、上記データを格納させる、
請求項１に記載の、データ間引き装置。In the read data, select data stored in one bit at a predetermined position, or data stored in a plurality of bits at a predetermined position,
When writing back to the general-purpose register, a plurality of sets of the above data are stored on one general-purpose register.
The data thinning-out device according to claim 1.

選択される任意のデータ以外のデータであって、プロセッサエレメント全体により構成される列の第１の端部近傍に位置するプロセッサエレメントが内蔵する汎用レジスタに格納される所定のデータを、読み出し、
それらデータを、プロセッサエレメント全体により構成される列の第２の端部近傍に位置するプロセッサエレメントが内蔵している汎用レジスタに、書き戻し、
同時に、
選択される任意のデータ以外のデータであって、プロセッサエレメント全体により構成される列の、第２の端部近傍に位置するプロセッサエレメントが内蔵する汎用レジスタに格納される所定のデータを、読み出し、
それらデータを、プロセッサエレメント全体により構成される列の第１の端部近傍に位置するプロセッサエレメントが内蔵している汎用レジスタに、書き戻す、
請求項１に記載の、データ間引き装置。Read predetermined data stored in a general-purpose register built in the processor element located near the first end of the column constituted by the entire processor element, which is data other than the arbitrary data to be selected,
The data is written back to a general-purpose register built in the processor element located near the second end of the column constituted by the entire processor element,
at the same time,
Read predetermined data stored in a general-purpose register built in the processor element located near the second end of the column constituted by the entire processor element, other than the arbitrary data to be selected,
The data is written back to a general-purpose register built in the processor element located near the first end of the column constituted by the entire processor element.
The data thinning-out device according to claim 1.

読み出されたデータを、１ビットのデータ若しくは複数ビットのデータに、分割し、
上記分割により形成される複数組の上記データを、別のプロセッサエレメントに内蔵される汎用レジスタに書き戻す、
請求項２に記載の、データ拡大装置。Divide the read data into 1-bit data or multi-bit data,
The plurality of sets of data formed by the division are written back to a general-purpose register built in another processor element.
The data expansion device according to claim 2.

書き戻しによりデータが格納される第１のプロセッサエレメントと、書き戻しによりデータが格納される第２のプロセッサエレメントとにおいて、
上記第１のプロセッサエレメントと上記第２のプロセッサエレメントとの間に、１つの又は複数の連続する第３のプロセッサエレメントが存在し、それら第３のプロセッサエレメントには、データ書き戻しは行なわれないとき、
第１のプロセッサエレメントに書き戻されるデータ、又は第２のプロセッサエレメントに書き戻されるデータが、
第３のプロセッサエレメントに内蔵される汎用レジスタに、複写して書かれる、
請求項２に記載の、データ拡大装置。In a first processor element in which data is stored by writing back and a second processor element in which data is stored by writing back,
One or a plurality of consecutive third processor elements exist between the first processor element and the second processor element, and no data is written back to these third processor elements. When
Data written back to the first processor element or data written back to the second processor element is
It is copied and written in a general-purpose register built in the third processor element.
The data expansion device according to claim 2.

縮小対象のデータを構成する最小構成単位が縮小対象のデータにいくつ含まれるかを示す個数分と等しい、ビット数までを、少なくともビット容量として備えるメモリ部を含み、
縮小対象のデータを構成する個々の最小構成単位にて間引き対象であるものと、上記メモリ部のビット位置とを対応させ、対応するビット位置に所定のデータを格納し、
その格納された所定のデータに基づいて、
プロセッサエレメントが内蔵している汎用レジスタに格納されるデータからの上記読み出し処理と、その後の汎用レジスタへの上記書き戻し処理とを行う
請求項１に記載の、データ間引き装置。Including a memory unit having at least a bit capacity up to the number of bits, which is equal to the number indicating the number of minimum structural units constituting the data to be reduced included in the data to be reduced,
Corresponding the data to be thinned out in each minimum constituent unit constituting the data to be reduced with the bit position of the memory unit, storing predetermined data in the corresponding bit position,
Based on the stored predetermined data,
The data thinning-out apparatus according to claim 1, wherein the reading process from the data stored in the general-purpose register built in the processor element and the subsequent write-back process to the general-purpose register are performed.

拡大処理後のデータを構成する最小構成単位が拡大処理後のデータにいくつ含まれるかを示す個数分と等しい、ビット数までを、少なくともビット容量として備えるメモリ部を含み、
拡大処理後のデータを構成する個々の最小構成単位にて拡大制御対象であるものと、上記メモリ部のビット位置とを対応させ、対応するビット位置に所定のデータを格納し、
その格納された所定のデータに基づいて、
プロセッサエレメントが内蔵している汎用レジスタに格納されるデータからの上記読み出し処理と、その後の汎用レジスタへの上記書き戻し処理とを行う
請求項２に記載の、データ拡大装置。Including a memory unit having at least a bit capacity up to the number of bits, which is equal to the number indicating the number of minimum structural units constituting the data after the enlargement process included in the data after the enlargement process,
Corresponding the enlargement control target in each minimum constituent unit constituting the data after the enlargement processing and the bit position of the memory unit, storing predetermined data in the corresponding bit position,
Based on the stored predetermined data,
The data expansion device according to claim 2, wherein the reading process from the data stored in the general-purpose register incorporated in the processor element and the subsequent write-back process to the general-purpose register are performed.