JP2004303058A

JP2004303058A - Vector processor and its data processing method

Info

Publication number: JP2004303058A
Application number: JP2003097068A
Authority: JP
Inventors: Daisuke Miyakoshi; 大輔宮腰
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2003-03-31
Filing date: 2003-03-31
Publication date: 2004-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To reduce resources and power consumption by eliminating any wasteful latch to a register included in an arithmetic unit when a chaining path is necessary. <P>SOLUTION: A chaining circuit 42 of a register read unit 4-1 decides whether the storage register of scalar data used by the vector arithmetic operation of an arithmetic unit 1-1 is matched with the storage register of scaler data being the arithmetic result of an arithmetic unit 1-2, and when they are matched, a chaining enable signal SRCA1-CHEN is generated. In this case, the chaining circuit 42 makes output data DST1-DATA of the arithmetic unit 1-2 chain-pass to the arithmetic unit 1-1 by using the chaining enable signal SRCA1-CHEN, and a register 13 latches this. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、ベクトル演算を高速に実行するベクトルプロセッサおよびその演算処理方法に関するものである。
【０００２】
【従来の技術】
ベクトルプロセッサは、第１の演算器の出力を、第２の演算器の入力として使用する場合には、いったんレジスタを経由することで１サイクル余分な処理を行わずに、第１の演算器の出力を第２の演算器の入力側にパスする経路を設けることで実行サイクルを短縮する手法が、チェイニング手法として知られている。
【０００３】
このようなチェイニング手法を用いた従来装置としては、データ処理装置（特許文献１参照）や、ベクトル演算処理装置（特許文献２参照）などが知られている。
特許文献１に記載のデータ処理装置は専用レジスタを指定することにより、また、特許文献２に記載のベクトル演算処理装置は汎用レジスタの少なくとも１つを指定することにより、チェイニングパスを有効にする方法（手段）が開示されている。
【０００４】
【特許文献１】
特開昭６１−２３２７６号公報
【特許文献２】
特開昭５６−８８５６１号公報
【０００５】
【発明が解決しようとする課題】
しかし、特許文献１に記載のデータ処理装置では、専用のレジスタを持つようになっているので回路の増加となり、特に、チェイニングのパスを増加したい場合などには、回路規模が増加してしまうという不具合がある。
また、特許文献２に記載のベクトル演算処理装置では、汎用レジスタを使用しているので汎用性に優れるが、第１の演算結果が常に変化しないような演算（演算の途中で一度だけ入力を変化させたい場合など）でも、常に前の演算結果を取り込んでくるような形になってしまう。
【０００６】
ところで、ベクトルプロセッサにおいて、並列動作が可能な複数の演算器と、ベクトルデータを格納するベクトルレジスタと、スカラレジスタを格納するスカラレジスタとを備えるものが知られている。
このようなプロセッサにおいて、ベクトルレジスタ同士の演算は、例えば次の（１）式のようになる。
【０００７】
ΣＶＲｘ〔ｉ〕＋Ｖｒｙ〔ｉ〕・・・・（１）
この場合には、同じ要素同士を掛け合わせればよいが、その要素として値が一定になる場合が多い演算係数や定数の場合には、同一のデータをベクトルレジスタにいくつも重複して書き込む必要があり、リソースの有効利用という点から非効率的である。
【０００８】
一方、ベクトルレジスタとスカラレジスタの演算の場合には、例えば次の（２）式のようになる。
ΣＶＲｘ〔ｉ〕＋ＳＲｙ・・・・（２）
この場合には、第２のオペランドＳＲｙを変更したい場合には、いったんベクトル演算を停止させ、そのオペランドＳＲｙを書き換えてから新たに演算を行うなどの特別な操作が必要となるため、命令コードの複雑化が考えられる。
【０００９】
換言すると、演算の途中において、一度だけ値を書き込むような第１の演算（スカラ演算）の演算結果を、第２のベクトル演算の入力としたい場合には、従来のチェイニング手法では、第１の演算をベクトル化するなどのソフトウエア側での工夫が必要であると共に、必要でない演算をも行うために消費電力の増加などのデメリットも考えられる。
【００１０】
そこで、本発明の目的は、チェイニングパスが必要なときに、演算器に含まれるレジスタへの無駄なラッチをなくすようにし、リソースの削減および消費電力の低減化を図ることができるベクトルプロセッサおよびその演算処理方法を提供することにある。
【００１１】
【課題を解決するための手段】
上記の課題を解決し本発明の目的を達成するために、各発明は、以下のように構成した。
すなわち、第１の発明は、ベクトルデータとスカラデータをそれぞれ格納する複数からなるレジスタ群と、ベクトルデータまたはスカラデータを用いて２つのデータ間の演算を行うｎ個の演算器と、前記ｎ個の演算器の各演算結果を取り込み、チェイニングが必要なときにその各演算結果を所望の演算器の入力側に供給する２ｎ個のチェイニング回路とを備え、前記各チェイニング回路は、前記チェイニングが可能なときに対応する演算器に対してチェイニング・イネーブル信号を生成するようになっており、前記各演算器は、前記レジスタ群からのデータまたは前記ｎ個の演算器からの演算データのうちの１つを一時的に格納する記憶素子を含み、前記記憶素子は、前記チェイニング・イネーブル信号がある場合にはそのチェイニングに係る演算器からの演算データを記憶するようにした。
【００１２】
また、第２の発明は、第１演算器でベクトル演算の演算中に、第２演算器のスカラ演算の演算結果を使用できるベクトルプロセッサにおいて、前記第１演算器のベクトル演算で使用しているスカラデータと、前記第２演算器の演算結果であるスカラデータと、を格納するレジスタが一致したか否かを検出し、一致したときにはチェイニング・イネーブル信号を生成するようにし、前記チェイニング・イネーブル信号が生成された場合に、前記第２演算器からの出力データを前記第１演算器にパスさせて、その第１演算器が有するパイプラインレジスタにラッチさせるようにした。
【００１３】
このように本発明は、少なくとも１つの汎用レジスタに対してチェイニングパスを設けるとともに、チェイニングパスが、そのレジスタに対するデータの書き込み要求と読み出し要求とが重なった場合に有効になるようにし、そのデータをチェイニングに係る演算器に伝えるようにした。
このため、本発明によれば、チェイニングパスが必要ときに、演算器が有するレジスタの無駄なラッチがなくなり、リソースの削減および消費電力の低減化を図ることができる。
【００１４】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
図１は、本発明のベクトルプロセッサの実施形態の構成を示すブロック図である。
この実施形態に係るベクトルプロセッサは、図１に示すように、ｎ個からなる演算器１−１、１−１・・・・１−ｎと、レジスタライト回路２と、レジスタ群３と、ｎ個のレジスタリードユニット４−１、４−２・・・・４−ｎおよびｎ個のレジスタリードユニット５−１、５−２・・・・５−ｎからなるレジスタリード回路６と、プログラム制御回路７と、を備えている。
【００１５】
演算回路１−１は演算を行う回路であり、マルチプレクサ１１、１２と、記憶素子であるレジスタ（パイプラインレジスタ）１３、１４と、演算ブロック１５と、制御ブロック１６とを備えている。
マルチプレクサ１１には、プログラム制御回路７からのイネーブル信号ＳＲＣＡ１−ＥＮと、レジスタリードユニット４−１からのチェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮが供給され、その両信号は制御ブロック１６からの制御により後述のように選択されるようになっている。
【００１６】
マルチプレクサ１２には、プログラム制御回路７からのイネーブル信号ＳＲＣＢ１−ＥＮと、レジスタリードユニット５−１からのチェイニング・イネーブル信号ＳＲＣＢ１−ＣＨＥＮが供給され、その両信号は制御ブロック１６からの制御により選択されるようになっている。
ここで、イネーブル信号ＳＲＣＡ１−ＥＮ、ＳＲＣＢ１−ＥＮは、演算器１−１に対するデータの書き込みを許可する信号である。
【００１７】
レジスタ１３は、レジスタリードユニット４−１からのデータＳＲＣＡ１−ＤＡＴＡを記憶するものであり、マルチプレクサ１１からのイネーブル信号ＳＲＣＡ１−ＥＮまたはレジスタリードユニット４−１からのチェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮにより動作するようになっている。
レジスタ１４は、レジスタリードユニット５−１からのデータＳＲＣＢ１−ＤＡＴＡを記憶するものであり、マルチプレクサ１２からのイネーブル信号ＳＲＣＢ１−ＥＮまたはレジスタリードユニット５−１からのチェイニング・イネーブル信号ＳＲＣＢ１−ＣＨＥＮにより動作するようになっている。
【００１８】
演算ブロック１５は、レジスタ１３およびレジスタ１４に記憶されるデータを用いて所望の演算を行うものである。この演算ブロック１５（演算器１−１）の演算結果である演算データＤＳＴ１−ＤＡＴＡは、レジスタライト回路２、レジスタリードユニット４−１〜４−ｎの各チェイニング回路４２、およびレジスタリードユニット５−１〜５−ｎの各チェイニング回路（図示せず）に供給されるようになっている。
【００１９】
制御ブロック１６は、プログラム制御回路７からの指令に基づき、演算命令がベクトル命令またはスカラ命令であるかを判定し、この判定の結果、スカラ命令の場合には、マルチプレクサ１１またはマルチプレクサ１２がイネーブル信号ＳＲＣＡ１−ＥＮまたはイネーブル信号ＳＲＣＡ１−ＥＮを選択してレジスタ１３またはレジスタ１４に出力する制御を行う。また、その判定の結果、ベクトル命令の場合には、マルチプレクサ１１またはマルチプレクサ１２がチェイニング・イネーブル信号ＳＲＣＡ１−ＥＮまたはチェイニング・イネーブル信号ＳＲＣＡ１−ＥＮを選択してレジスタ１３またはレジスタ１４に出力する制御を行う。
【００２０】
また、制御ブロック１６は、演算ブロック１６からの演算データＤＳＴ１−ＤＡＴＡの出力を許可するイネーブル信号ＤＳＴ１−ＥＮを、レジスタライト回路２に出力するようになっている。
演算器１−２〜１−ｎの内部の構成は、演算器１−１と同様であるが、その入力信号と出力信号が異なるので、その各種の信号との関係について以下に説明する。
【００２１】
演算器１−２〜１−ｎには、レジスタリードユニット４−２〜４−ｎから出力されるデータＳＲＣＡ２−ＤＡＴＡ〜ＳＲＣＡｎ−ＤＡＴＡ、およびレジスタリードユニット５−２〜５−ｎから出力されるデータＳＲＣＢ２−ＤＡＴＡ〜ＳＲＣＢｎ−ＤＡＴＡが入力されるようになっている。
また、演算器１−２〜１−ｎには、プログラム制御回路７から出力されるイネーブル信号ＳＲＣＡ２−ＥＮ〜ＳＲＣＡｎ−ＥＮ、およびイネーブル信号ＳＲＣＢ２−ＥＮ〜ＳＲＢＡｎ−ＥＮが入力されるようになっている。
【００２２】
さらに、演算器１−２〜１−ｎには、レジスタリードユニット４−２〜４−ｎから出力されるチェイニング・イネーブル信号ＳＲＣＡ２−ＣＨＥＮ〜ＳＲＣＡｎ−ＣＨＥＮ、およびレジスタリードユニット５−２〜５−ｎから出力されるチェイニング・イネーブル信号ＳＲＣＢ２−ＣＨＥＮ〜ＳＲＣＢｎ−ＣＨＥＮから出力されるようになっている。
【００２３】
また、演算器１−２〜１−ｎからの演算データＤＳＴ２−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡは、レジスタライト回路２、レジスタリードユニット４−１〜４−ｎの各チェイニング回路４２、およびレジスタリードユニット５−１〜５−ｎの各チェイニング回路（図示せず）に供給されるようになっている。
さらに、演算器１−２〜１−ｎからのイネーブル信号ＤＳＴ２−ＥＮ〜ＤＳＴｎ−ＥＮは、レジスタライト回路２に出力するようになっている。
【００２４】
レジスタライト回路２は、演算器１−１〜１−ｎからのイネーブル信号ＤＳＴ１−ＥＮ〜ＤＳＴｎ−ＥＮと、プログラム制御回路７からのセル指定信号ＤＳＴ１−ＳＥＬ〔ｍ〕〜ＤＳＴｎ−ＳＥＬ〔ｍ〕とに基づき、レジスタ群３の所望のレジスタに格納する書き込み信号ＷＲ−ＥＮ〔ｍ〕と、その所望のレジスタに格納する書き込みデータＷＲ−ＤＡＴＡ〔ｍ〕を生成するようになっている。
【００２５】
また、レジスタライト回路２は、演算器１−１〜１−ｎからの各イネーブル信号ＤＳＴ１−ＥＮ〜ＤＳＴｎ−ＥＮと、これに対応するプログラム制御回路７からの各セル指定信号ＤＳＴ１−ＳＥＬ〔ｍ〕〜ＤＳＴｎ−ＳＥＬ〔ｍ〕との間で論理積演算を行い、各書き込み信号ＤＳＴ１−ＷＥ〔ｍ〕〜ＤＳＴｎ−ＷＥ〔ｍ〕を生成するようになっている。例えば、イネーブル信号ＤＳＴ１−ＥＮとセル指定信号ＤＳＴ１−ＳＥＬ〔ｍ〕との間で論理積演算を行い、書き込み信号ＤＳＴ１−ＷＥ〔ｍ〕を生成するようになっている。
【００２６】
レジスタライト回路２で生成される書き込み信号ＤＳＴ１−ＷＥ〔ｍ〕〜ＤＳＴｎ−ＷＥ〔ｍ〕は、レジスタリードユニット４−１〜４−ｎの各チェイニング回路４２、およびレジスタリードユニット５−１〜５−ｎの各チェイニング回路（図示せず）に供給されるようになっている。
レジスタ群３は、演算器１−１〜１−ｎからの各演算データＤＳＴ１−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡを格納するためのｍ個のレジスタからなる。このｍ個のレジスタは、ベクトルデータを格納するベクトルレジスタと、スカラデータを格納するスカラレジスタとからなる。
【００２７】
このレジスタ群３のｍ個のレジスタの各データは、レジスタリードユニット４−１〜４−ｎの各リードレジスタ選択回路４１、およびレジスタリードユニット５−１〜５−ｎの各リードレジスタ選択回路（図示せず）に供給されるようになっている。
レジスタリードユニット４−１〜４−ｎは、レジスタ群３のうちの所望のレジスタのデータまたは演算器１−１〜１−ｎからのデータＤＳＴ１−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡのうちの１つを選択的に読み出し、この読み出したデータＳＲＣＡ１−ＤＡＴＡ〜ＳＲＣＡｎ−ＤＡＴＡを、対応する演算器１−１〜１−ｎに供給するようになっている。
【００２８】
また、レジスタリードユニット４−１〜４−ｎは、チェイニング回路４２により後述のようにチェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮ〜ＳＲＣＡｎ−ＣＨＥＮを生成し、これを対応する演算器１−１〜１−ｎに供給するようになっている。
このため、レジスタリードユニット４−１は、図１に示すように、リードレジスタ選択回路４１と、チェイニング回路４２と、マルチプレクサ（ＭＵＸ）４３とを備えている。
【００２９】
リードレジスタ選択回路４１は、プログラム制御回路７からのイネーブル信号ＳＲＣＡ１−ＥＮとセル選択信号ＳＲＣＡ１−ＳＥＬ〔ｍ〕に基づき、レジスタ群３のｍ個のレジスタのうちの１つからのデータを選択してマルチプレクサ４３に供給するようになっている。
チェイニング回路４２は、レジスタライト回路２からの書き込み信号ＤＳＴ１−ＷＥ〔ｍ〕〜ＤＳＴｎ−ＷＥ〔ｍ〕と、リードレジスタ選択回路４１からの読み出し信号ＳＲＣＡ１−ＲＥ〔ｍ〕とに基づいてチェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮを生成するとともに、チェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮが生成されたときには、演算器１−１〜１−ｎからのデータＤＳＴ１−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡのうちの１つが、マルチプレクサ４３に供給されるようになっている。
【００３０】
マルチプレクサ４３は、通常はリードレジスタ選択回路４１で選択されたレジスタ群３のレジスタのうちの１のデータを出力し、チェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮが出力されたときには、チェイニング回路４２から出力されるデータＤＳＴ１−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡのうちの１つを出力させるものである。
【００３１】
ここで、レジスタリードユニット４−２〜４−ｎは、その内部構成はレジスタリードユニット４−１と同様であるのでその説明は省略し、その入力信号と出力信号は図示の通りである。
次に、レジスタリードユニット５−１〜５−ｎは、レジスタ群３のうちの所望のレジスタのデータまたは演算器１−１〜１−ｎからのデータＤＳＴ１−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡのうちの１つを選択的に読み出し、この読み出したデータＳＲＣＢ１−ＤＡＴＡ〜ＳＲＣＢｎ−ＤＡＴＡを、対応する演算器１−１〜１−ｎに供給するようになっている。
【００３２】
また、レジスタリードユニット５−１〜５−ｎは、チェイニング回路（図示せず）によりチェイニング・イネーブル信号ＳＲＣＢ１−ＣＨＥＮ〜ＳＲＣＢｎ−ＣＨＥＮを生成し、これに対応する演算器１−１〜１−ｎに供給するようになっている。
ここで、レジスタリードユニット５−１〜５−ｎの内部構成は、レジスタリードユニット４−１〜４−ｎと同様であるので、その説明は省略する。
【００３３】
プログラム制御回路７は、ベクトル命令またスカラ命令に基づいて、演算器１−１〜１−ｎがその命令に基づいて演算を行い、その演算結果をレジスタ群３の所望のレジスタへ格納したり、またはその演算結果をそのまま演算器１−１〜１−ｎに入力させたりするために、各部に上記のような各信号を供給する回路である。
【００３４】
次に、チェイニング回路４２の具体的な構成について、図２を参照して説明する。
このチェイニング回路４２は、ｎ個のコンパレータ（比較器）４２１−１〜４２１−ｎと、ｎ個のアンド回路４２２−１〜４２２−ｎと、オア回路４２３と、オア回路４２４とを備えている。
【００３５】
コンパレータ４２１−１〜４２１−ｎの一方の各入力端子には、レジスタライト回路３からの各書き込み可能信号ＤＳＴ１−ＷＥ〔ｍ〕〜ＤＳＴｎ−ＷＥ〔ｍ〕がそれぞれ入力され、その他方の各入力端子にはリードレジスタ選択回路４１からの読み出し信号ＳＲＣＡ１−ＲＥ〔ｍ〕が入力されるようになっている。
コンパレータ４２１−１〜４２１−ｎからの各出力信号は、オア回路４２３に供給されて論理和演算行われ、その演算結果がチェイニング・イネーブル信号ＳＲＣＡ−ＣＨＥＮとして出力されるようになっている。また、コンパレータ４２１−１〜４２１−ｎからの各出力信号は、対応するアンド回路４２２−１〜４２２−ｎの一方の各入力端子に供給されるようになっている。
【００３６】
さらに、アンド回路４２２−１〜４２２−ｎの他方の各入力端子には、演算器１−１〜１−ｎからのデータＤＳＴ１−ＤＡＴＡ〜ＤＳＴｎ−ＤＡＴＡが入力されるようになっている。また、アンド回路４２２−１〜４２２−ｎの各出力信号は、オア回路４２３に供給されて論理和演算行われ、その演算結果がデータＳＲＣＡ−ＤＡＴＡとして出力されるようになっている。
【００３７】
次に、このような構成からなる実施形態の演算処理について、図１および図２を参照して説明する。
この演算処理例では、演算器１−１でベクトル演算の演算中に、演算器１−２のスカラ演算の演算結果を入力として使用する場合であって、演算器１−１のベクトル演算で使用しているスカラデータと、演算器１−２の演算結果であるスカラデータとを格納するレジスタが一致する場合について説明する。
【００３８】
この場合には、演算器１−１のレジスタ１３にはスカラデータが格納され、レジスタ１４にはベクトルデータが格納され、この両データが演算ブロック１５で演算されているものとする。演算器１−１は、その演算結果であるデータＤＳＴ１−ＤＡＴＡを、レジスタライト回路２、レジスタリードユニット４−１〜４−ｎ、およびレジスタリードユニット５−１〜５−ｎにそれぞれ出力する。
【００３９】
ここで、レジスタ１３に格納されるスカラデータは、レジスタ群３中の任意のレジスタに格納されているものが使用される。
このように、演算器１−１のベクトル演算の演算中に、演算器１−２のスカラ演算の演算結果であるスカラデータＤＳＴ２−ＤＡＴＡを、演算器１−１にチェイニングパスさせる場合の動作について説明する。
【００４０】
この場合には、そのスカラデータＤＳＴ２−ＤＡＴＡは、レジスタライト回路２によりレジスタ群３中のうちの上記と同じレジスタに格納されるとともに、レジスタリードユニット４−１のチェイニング回路４２によりパスされてデータＳＲＣＡ１−ＤＡＴＡとなり、演算器１−１のレジスタ１３の内容が書き換えられる。
【００４１】
すなわち、レジスタライト回路２は、演算器１−２からのイネーブル信号ＤＳＴ２−ＥＮと、プログラム制御回路７からセル指定信号ＤＳＴ２−ＳＥＬ〔ｍ〕に基づき、レジスタ群３の上記のレジスタに格納する書き込み信号ＷＲ−ＥＮ〔ｍ〕と、その上記のレジスタに格納する書き込みデータＷＲ−ＤＡＴＡ〔ｍ〕を生成する。この結果、レジスタライト回路２は、スカラデータＤＳＴ２−ＤＡＴをその上記のレジスタに格納する。
【００４２】
また、レジスタライト回路２は、演算器１−１〜１−ｎからの各イネーブル信号ＤＳＴ１−ＥＮ〜ＤＳＴｎ−ＥＮと、これに対応するプログラム制御回路７からの各セル指定信号ＤＳＴ１−ＳＥＬ〔ｍ〕〜ＤＳＴｎ−ＳＥＬ〔ｍ〕との間でそれぞれ論理積演算を行い、各書き込み信号ＤＳＴ１−ＷＥ〔ｍ〕〜ＤＳＴｎ−ＷＥ〔ｍ〕を生成し、この各生成信号をレジスタリードユニット４−１のチェイニング回路４２に出力する。
【００４３】
この場合には、セル指定信号ＤＳＴ２−ＳＥＬ〔ｍ〕が、イネーブル信号ＤＳＴ２−ＥＮによって書き込み信号ＤＳＴ２−ＷＥ〔ｍ〕として出力される。
さらに、リードレジスタ選択回路４１は、プログラム制御回路７からのイネーブル信号ＳＲＣＡ１−ＥＮとセル選択信号ＳＲＣＡ１−ＳＥＬ〔ｍ〕との論理積演算を行い、その演算結果である読み出し信号データＳＲＣＡ−ＲＥ〔ｍ〕をチェイニング回路４２に出力する。
【００４４】
図２に示すチェイニング回路４２のコンパレータ４２１−１〜４２１−ｎの一方の各入力端子には、レジスタライト回路３からの各書き込み可能信号ＤＳＴ１−ＷＥ〔ｍ〕〜ＤＳＴｎ−ＷＥ〔ｍ〕がそれぞれ入力され、その他方の各入力端子にはリードレジスタ選択回路４１からの読み出し信号ＳＲＣＡ１−ＲＥ〔ｍ〕が入力される。
【００４５】
この場合には、コンパレータ４２１−２に入力される書き込み可能信号ＤＳＴ２−ＷＥ〔ｍ〕と読み出し信号ＳＲＣＡ１−ＲＥ〔ｍ〕が一致するので、その一致信号がオア回路４２３を経由してチェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮとして出力される。このチェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮは、マルチプレクサ４３と演算器１−１のマルチプレクサ１１とにそれぞれ供給される。
【００４６】
また、アンド回路４２２−２は、その一方の入力端子にコンパレータ４２１−２からの上記の一致信号が入力され、その他方の入力端子に演算器１−２で演算された出力データＤＳＴ２−ＤＡＴＡが供給されている。このため、アンド回路４２２−２からは、上記の一致信号によってその出力データＤＳＴ２−ＤＡＴＡが出力され、この出力データがマルチプレクサ４３に供給される。
【００４７】
マルチプレクサ４３は、チェイニング・イネーブル信号ＳＲＣＡ１−ＣＨＥＮにより、チェイニング回路４２から出力される出力データをＤＳＴ２−ＤＡＴＡを演算器１−１のレジスタ１３に転送可能とする。
このとき、チェイニング回路４２からチェイニング・イネーブル信号ＳＲＣＢ−ＣＨＥＮがマルチプレクサを経由してレジスタ１３のイネーブル信号として使用されている。このため、その出力データＤＳＴ２−ＤＡＴＡは、レジスタ１３に格納される。
【００４８】
従って、演算器１−１は、レジスタ１４に格納されるベクトルデータと、上記のように演算器１−２の演算結果をパスさせてレジスタ１３に格納させたスカラデータとを用いてベクトル演算を行うことができる。
以上説明したように、この実施形態によれば、演算器１−１でベクトル演算の演算中に演算器１−２のスカラ演算の演算結果を入力として使用する場合のように、チェイニングが起こった場合に演算器のレジスタに必要なデータをラッチするようにした。このため、チェイニングが成立していない場合には、無駄なデータを演算器のレジスタにラッチせずに済むので、低消費電力化を実現することができる。
【００４９】
次に、上記の実施形態に係るベクトルプロッセを、画像処理の１つであるＤＣＴ演算処理に適用した場合について説明する。
ＤＣＴ（離散コサイン変換）は、画像データとＤＣＴ係数を掛け合わせる以下の（３）式のような演算処理である。
ΣＶＲ〔ｉ〕＊ＳＲ・・・・（３）
ここで、ＶＲ〔ｉ〕は画像データを格納したベクトルレジスタを示し、ＳＲはＤＣＴ係数を格納したスカラレジスタを示す。
【００５０】
ここで、ＤＣＴ係数にベクトルレジスタを使用しても良いが、画像データの高周波数側の係数の殆どが「０」である場合が多いため、ベクトルレジスタのリソース（資源）の節約という観点から、本実施例では「０」以外の値が示された場合にのみスカラレジスタに書き込みを行うことができる。
このため、スカラレジスタに書き込みがあった場合にのみ演算器側のパイプライン処理用のレジスタも動作させ、リソースの節約と同時に無駄な回路を動作させることなく、消費電力の低減化を図ることができる。
【００５１】
【発明の効果】
以上説明したように、本発明によれば、チェイニングパスが必要ときに、演算器が有するレジスタの無駄なラッチがなくなり、リソースの削減および消費電力の低減化を図ることができる。
【図面の簡単な説明】
【図１】本発明の実施形態の構成を示すブロック図である。
【図２】図１のチェイニング回路の具体的な構成を示す回路図である。
【符号の説明】
１−１〜１−ｎは演算器、２はレジスタライト回路、３はレジスタ群、４−１〜４−ｎはレジスタリードユニット、５−１〜５−ｎはレジスタリードユニット、６はレジスタリード回路、７はプログラム制御回路、１１、１２はマルチプレクサ、１３、１４はレジスタ１３、１４、１５は演算ブロック、１６は制御ブロック、４１はリードレジスタ選択回路、４２はチェイニング回路、４３はマルチプレクサである。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a vector processor that executes a vector operation at high speed, and a method of processing the vector processor.
[0002]
[Prior art]
When the output of the first computing unit is used as the input of the second computing unit, the vector processor once passes through the register without performing one cycle of extra processing. A technique for shortening the execution cycle by providing a path for passing an output to the input side of the second computing unit is known as a chaining technique.
[0003]
As a conventional device using such a chaining method, a data processing device (see Patent Document 1), a vector operation processing device (see Patent Document 2), and the like are known.
The data processing device described in Patent Literature 1 specifies a dedicated register, and the vector arithmetic processing device described in Patent Literature 2 specifies at least one general-purpose register to enable a chaining path. A method (means) is disclosed.
[0004]
[Patent Document 1]
JP-A-61-23276 [Patent Document 2]
JP-A-56-88561
[Problems to be solved by the invention]
However, the data processing device described in Patent Literature 1 has a dedicated register, so that the number of circuits increases. Particularly, when it is desired to increase the number of paths for chaining, the circuit scale increases. There is a problem.
Further, the vector operation processing device described in Patent Document 2 is excellent in versatility because a general-purpose register is used. However, an operation in which the first operation result does not always change (input changes only once during the operation). Even if you want to do so), it will always take in the previous calculation result.
[0006]
By the way, there is known a vector processor that includes a plurality of arithmetic units capable of performing parallel operations, a vector register that stores vector data, and a scalar register that stores a scalar register.
In such a processor, the operation between the vector registers is, for example, as shown in the following equation (1).
[0007]
ΣVRx [i] + Vry [i] (1)
In this case, the same element may be multiplied.However, in the case of an arithmetic coefficient or a constant whose value is often constant as the element, it is necessary to write the same data to the vector register repeatedly. Yes, it is inefficient in terms of effective use of resources.
[0008]
On the other hand, in the case of operation of a vector register and a scalar register, for example, the following equation (2) is used.
ΣVRx [i] + SRy (2)
In this case, when it is desired to change the second operand SRy, a special operation such as temporarily stopping the vector operation, rewriting the operand SRy and then performing a new operation is required. It may be complicated.
[0009]
In other words, in the middle of the operation, if the operation result of the first operation (scalar operation) in which a value is written only once is to be input to the second vector operation, the conventional chaining method uses the first operation. It is necessary to devise the software side, such as vectorizing the operation, and there are also possible disadvantages, such as an increase in power consumption, for performing unnecessary operations.
[0010]
Therefore, an object of the present invention is to provide a vector processor and a vector processor capable of eliminating useless latches to registers included in an arithmetic unit when a chaining path is required, thereby reducing resources and power consumption. An object of the present invention is to provide an arithmetic processing method.
[0011]
[Means for Solving the Problems]
In order to solve the above problems and achieve the object of the present invention, each invention is configured as follows.
That is, the first invention includes a plurality of register groups for respectively storing vector data and scalar data, n arithmetic units for performing an operation between two data using vector data or scalar data, And 2n chaining circuits that take in each operation result of the arithmetic unit and supply each operation result to the input side of a desired arithmetic unit when the chaining is required, and each of the chaining circuits includes When the chaining is possible, a chaining enable signal is generated for a corresponding operation unit, and each of the operation units receives data from the register group or operation from the n operation units. A storage element for temporarily storing one of the data, wherein the storage element is engaged in the chaining when there is the chaining enable signal. And configured to store the operation data from the operation unit.
[0012]
According to a second aspect of the present invention, in the vector processor which can use the operation result of the scalar operation of the second operation unit during the operation of the vector operation in the first operation unit, it is used in the vector operation of the first operation unit. It detects whether or not the register storing the scalar data and the scalar data which is the operation result of the second arithmetic unit match, and generates a chaining enable signal when the register matches, and When the enable signal is generated, the output data from the second computing unit is passed to the first computing unit and latched in a pipeline register of the first computing unit.
[0013]
As described above, the present invention provides a chaining path for at least one general-purpose register, and enables the chaining path to be effective when a data write request and a data read request for the register overlap each other. The data is transmitted to the arithmetic unit for chaining.
For this reason, according to the present invention, when a chaining path is required, there is no needless latch of the register of the arithmetic unit, and it is possible to reduce resources and power consumption.
[0014]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of an embodiment of a vector processor according to the present invention.
As shown in FIG. 1, the vector processor according to this embodiment includes n operation units 1-1, 1-1,... 1-n, a register write circuit 2, a register group 3, and n ... 4-n and n register read units 5-1, 5-2... 5-n, and program control. And a circuit 7.
[0015]
The arithmetic circuit 1-1 is a circuit that performs an arithmetic operation, and includes multiplexers 11 and 12, registers (pipeline registers) 13 and 14, which are storage elements, an arithmetic block 15, and a control block 16.
The multiplexer 11 is supplied with an enable signal SRCA1-EN from the program control circuit 7 and a chaining enable signal SRCA1-CHEN from the register read unit 4-1. Is to be selected as follows.
[0016]
The multiplexer 12 is supplied with an enable signal SRCB1-EN from the program control circuit 7 and a chaining enable signal SRCB1-CHEN from the register read unit 5-1. Both signals are selected by control from the control block 16. It is supposed to be.
Here, the enable signals SRCA1-EN and SRCB1-EN are signals that permit writing of data to the arithmetic unit 1-1.
[0017]
The register 13 stores the data SRCA1-DATA from the register read unit 4-1 and receives the enable signal SRCA1-EN from the multiplexer 11 or the chaining enable signal SRCA1-CHEN from the register read unit 4-1. It is supposed to work.
The register 14 stores the data SRCB1-DATA from the register read unit 5-1 and receives the enable signal SRCB1-EN from the multiplexer 12 or the chaining enable signal SRCB1-CHEN from the register read unit 5-1. It is supposed to work.
[0018]
The operation block 15 performs a desired operation using the data stored in the registers 13 and 14. The operation data DST1-DATA as the operation result of the operation block 15 (the operation unit 1-1) is transmitted to the register write circuit 2, the chaining circuits 42 of the register read units 4-1 to 4-n, and the register read unit 5 -1 to 5-n are supplied to respective chaining circuits (not shown).
[0019]
The control block 16 determines whether the operation instruction is a vector instruction or a scalar instruction based on the instruction from the program control circuit 7. As a result of this determination, if the instruction is a scalar instruction, the multiplexer 11 or the multiplexer 12 outputs the enable signal. Control is performed to select SRCA1-EN or enable signal SRCA1-EN and output it to register 13 or register 14. In addition, as a result of the determination, in the case of a vector instruction, the multiplexer 11 or the multiplexer 12 selects the chaining enable signal SRCA1-EN or the chaining enable signal SRCA1-EN and outputs it to the register 13 or the register 14. I do.
[0020]
Further, the control block 16 outputs an enable signal DST1-EN for permitting the output of the operation data DST1-DATA from the operation block 16 to the register write circuit 2.
The internal configuration of each of the arithmetic units 1-2 to 1-n is the same as that of the arithmetic unit 1-1, but since the input signal and the output signal are different, the relationship between the various signals will be described below.
[0021]
Data SRCA2-DATA to SRCAn-DATA output from register read units 4-2 to 4-n and output from register read units 5-2 to 5-n are output to arithmetic units 1-2 to 1-n. Data SRCB2-DATA to SRCBn-DATA are input.
In addition, enable signals SRCA2-EN to SRCAn-EN and enable signals SRCB2-EN to SRBan-EN output from the program control circuit 7 are input to the arithmetic units 1-2 to 1-n. I have.
[0022]
Further, the arithmetic units 1-2 to 1-n have the chaining enable signals SRCA2-CHEN to SRCAn-CHEN output from the register read units 4-2 to 4-n and the register read units 5-2 to 5 The chaining enable signals SRCB2-CHEN to SRCBn-CHEN output from -n are output.
[0023]
The operation data DST2-DATA to DSTn-DATA from the operation units 1-2 to 1-n are transmitted to the register write circuit 2, the respective chaining circuits 42 of the register read units 4-1 to 4-n, and the register read unit. 5-1 to 5-n are supplied to respective chaining circuits (not shown).
Further, enable signals DST2-EN to DSTn-EN from the arithmetic units 1-2 to 1-n are output to the register write circuit 2.
[0024]
The register write circuit 2 includes enable signals DST1-EN to DSTn-EN from the arithmetic units 1-1 to 1-n and cell designation signals DST1-SEL [m] to DSTn-SEL [m] from the program control circuit 7. Thus, a write signal WR-EN [m] to be stored in a desired register of the register group 3 and write data WR-DATA [m] to be stored in the desired register are generated.
[0025]
Further, the register write circuit 2 includes enable signals DST1-EN to DSTn-EN from the arithmetic units 1-1 to 1-n and corresponding cell designation signals DST1-SEL [m from the program control circuit 7 corresponding thereto. To DSTn-SEL [m] to generate a write signal DST1-WE [m] to DSTn-WE [m]. For example, a logical AND operation is performed between the enable signal DST1-EN and the cell designation signal DST1-SEL [m] to generate a write signal DST1-WE [m].
[0026]
The write signals DST1-WE [m] to DSTn-WE [m] generated by the register write circuit 2 correspond to the respective chaining circuits 42 of the register read units 4-1 to 4-n and the register read units 5-1 to 5-1. 5-n are supplied to respective chaining circuits (not shown).
The register group 3 includes m registers for storing the operation data DST1-DATA to DSTn-DATA from the operation units 1-1 to 1-n. The m registers include a vector register for storing vector data and a scalar register for storing scalar data.
[0027]
The data of the m registers of the register group 3 are read by the read register selection circuits 41 of the register read units 4-1 to 4-n and the read register selection circuits of the register read units 5-1 to 5-n ( (Not shown).
The register read units 4-1 to 4-n select data of a desired register in the register group 3 or one of data DST1-DATA to DSTn-DATA from the arithmetic units 1-1 to 1-n. Then, the read data SRCA1-DATA to SRCAn-DATA are supplied to the corresponding computing units 1-1 to 1-n.
[0028]
Further, the register read units 4-1 to 4-n generate the chaining enable signals SRCA1-CHEN to SRCAn-CHEN by the chaining circuit 42 as described later, and convert them into the corresponding arithmetic units 1-1 to 1-n. −n.
Therefore, the register read unit 4-1 includes a read register selection circuit 41, a chaining circuit 42, and a multiplexer (MUX) 43, as shown in FIG.
[0029]
The read register selection circuit 41 selects data from one of the m registers in the register group 3 based on the enable signal SRCA1-EN from the program control circuit 7 and the cell selection signal SRCA1-SEL [m]. The signal is supplied to the multiplexer 43.
The chaining circuit 42 performs chaining based on write signals DST1-WE [m] to DSTn-WE [m] from the register write circuit 2 and a read signal SRCA1-RE [m] from the read register selection circuit 41. When the enable signal SRCA1-CHEN is generated and the chaining enable signal SRCA1-CHEN is generated, one of the data DST1-DATA to DSTn-DATA from the arithmetic units 1-1 to 1-n is The data is supplied to the multiplexer 43.
[0030]
The multiplexer 43 normally outputs the data of one of the registers of the register group 3 selected by the read register selection circuit 41, and outputs the data from the chaining circuit 42 when the chaining enable signal SRCA1-CHEN is output. One of the data DST1-DATA to DSTn-DATA to be output.
[0031]
Here, the register read units 4-2 to 4-n have the same internal configuration as that of the register read unit 4-1 and thus the description thereof is omitted, and the input signals and output signals are as shown in the figure.
Next, the register read units 5-1 to 5-n store data of a desired register in the register group 3 or one of the data DST1-DATA to DSTn-DATA from the arithmetic units 1-1 to 1-n. One of them is selectively read, and the read data SRCB1-DATA to SRCBn-DATA are supplied to the corresponding computing units 1-1 to 1-n.
[0032]
The register read units 5-1 to 5-n generate the chaining enable signals SRCB1-CHEN to SRCBn-CHEN by a chaining circuit (not shown), and the arithmetic units 1-1 to 1-1 corresponding thereto. −n.
Here, since the internal configuration of the register read units 5-1 to 5-n is the same as that of the register read units 4-1 to 4-n, the description thereof is omitted.
[0033]
In the program control circuit 7, based on a vector instruction or a scalar instruction, the operation units 1-1 to 1-n perform an operation based on the instruction, and store the operation result in a desired register of the register group 3, Alternatively, it is a circuit that supplies each signal as described above to each unit in order to directly input the operation results to the arithmetic units 1-1 to 1-n.
[0034]
Next, a specific configuration of the chaining circuit 42 will be described with reference to FIG.
The chaining circuit 42 includes n comparators (comparators) 421-1 to 421-n, n AND circuits 422-1 to 422-n, an OR circuit 423, and an OR circuit 424. I have.
[0035]
The write enable signals DST1-WE [m] to DSTn-WE [m] from the register write circuit 3 are input to one input terminal of each of the comparators 421-1 to 421-n. The terminal receives a read signal SRCA1-RE [m] from the read register selection circuit 41.
Each output signal from the comparators 421-1 to 421-n is supplied to an OR circuit 423 to perform an OR operation, and the operation result is output as a chaining enable signal SRCA-CHEN. Each output signal from the comparators 421-1 to 421-n is supplied to one of the input terminals of the corresponding AND circuits 422-1 to 422-n.
[0036]
Further, data DST1-DATA to DSTn-DATA from the computing units 1-1 to 1-n are input to the other input terminals of the AND circuits 422-1 to 422-n. Each output signal of the AND circuits 422-1 to 422-n is supplied to an OR circuit 423 to perform a logical sum operation, and the operation result is output as data SRCA-DATA.
[0037]
Next, the arithmetic processing of the embodiment having such a configuration will be described with reference to FIGS.
In this operation processing example, the operation result of the scalar operation of the operation unit 1-2 is used as an input during the operation of the vector operation in the operation unit 1-1, and is used in the vector operation of the operation unit 1-1. A case will be described in which the register storing the scalar data is the same as the scalar data that is the operation result of the arithmetic unit 1-2.
[0038]
In this case, it is assumed that scalar data is stored in the register 13 of the arithmetic unit 1-1, vector data is stored in the register 14, and both data are calculated by the calculation block 15. Arithmetic unit 1-1 outputs data DST1-DATA as an operation result to register write circuit 2, register read units 4-1 to 4-n, and register read units 5-1 to 5-n, respectively.
[0039]
Here, as the scalar data stored in the register 13, data stored in an arbitrary register in the register group 3 is used.
As described above, the operation when the scalar data DST2-DATA, which is the result of the scalar operation of the arithmetic unit 1-2, is passed to the arithmetic unit 1-1 during the vector arithmetic operation of the arithmetic unit 1-1. Will be described.
[0040]
In this case, the scalar data DST2-DATA is stored in the same register in the register group 3 by the register write circuit 2 and passed by the chaining circuit 42 of the register read unit 4-1. The data becomes SRCA1-DATA, and the contents of the register 13 of the arithmetic unit 1-1 are rewritten.
[0041]
That is, the register write circuit 2 writes the data to be stored in the registers of the register group 3 based on the enable signal DST2-EN from the arithmetic unit 1-2 and the cell designation signal DST2-SEL [m] from the program control circuit 7. A signal WR-EN [m] and write data WR-DATA [m] to be stored in the register are generated. As a result, the register write circuit 2 stores the scalar data DST2-DAT in the register.
[0042]
Further, the register write circuit 2 includes enable signals DST1-EN to DSTn-EN from the arithmetic units 1-1 to 1-n and corresponding cell designation signals DST1-SEL [m from the program control circuit 7 corresponding thereto. To DSTn-SEL [m] to generate AND signals DST1-WE [m] to DSTn-WE [m], respectively, and use the generated signals as the register read unit 4-1. To the chaining circuit 42 of FIG.
[0043]
In this case, the cell designation signal DST2-SEL [m] is output as the write signal DST2-WE [m] by the enable signal DST2-EN.
Further, the read register selection circuit 41 performs a logical product operation of the enable signal SRCA1-EN from the program control circuit 7 and the cell selection signal SRCA1-SEL [m], and the read signal data SRCA-RE [ m] to the chaining circuit 42.
[0044]
The writable signals DST1-WE [m] to DSTn-WE [m] from the register write circuit 3 are input to one input terminal of each of the comparators 421-1 to 421-n of the chaining circuit 42 shown in FIG. The read signals SRCA1-RE [m] from the read register selection circuit 41 are input to the other input terminals, respectively.
[0045]
In this case, the write enable signal DST2-WE [m] input to the comparator 421-2 matches the read signal SRCA1-RE [m], and the match signal is sent via the OR circuit 423 to the chaining circuit 423. Output as enable signals SRCA1-CHEN. The chaining enable signal SRCA1-CHEN is supplied to the multiplexer 43 and the multiplexer 11 of the arithmetic unit 1-1.
[0046]
The AND circuit 422-2 has one input terminal to which the above-mentioned coincidence signal from the comparator 421-2 is input, and to the other input terminal, the output data DST2-DATA calculated by the arithmetic unit 1-2. Supplied. Therefore, the output data DST2-DATA is output from the AND circuit 422-2 by the above-mentioned coincidence signal, and the output data is supplied to the multiplexer 43.
[0047]
The multiplexer 43 enables the output data output from the chaining circuit 42 to transfer DST2-DATA to the register 13 of the arithmetic unit 1-1 by the chaining enable signal SRCA1-CHEN.
At this time, the chaining enable signal SRCB-CHEN from the chaining circuit 42 is used as an enable signal for the register 13 via the multiplexer. Therefore, the output data DST2-DATA is stored in the register 13.
[0048]
Therefore, the arithmetic unit 1-1 performs a vector operation using the vector data stored in the register 14 and the scalar data stored in the register 13 by passing the operation result of the arithmetic unit 1-2 as described above. It can be carried out.
As described above, according to this embodiment, chaining occurs as in the case where the operation result of the scalar operation of the operation unit 1-2 is used as an input during the operation of the vector operation in the operation unit 1-1. In such a case, necessary data is latched in the register of the arithmetic unit. For this reason, when the chaining is not established, useless data does not need to be latched in the register of the arithmetic unit, so that low power consumption can be realized.
[0049]
Next, a case will be described in which the vector process according to the above-described embodiment is applied to DCT calculation processing, which is one type of image processing.
DCT (Discrete Cosine Transform) is an arithmetic process such as the following Expression (3) in which image data is multiplied by a DCT coefficient.
ΣVR [i] * SR ... (3)
Here, VR [i] indicates a vector register storing image data, and SR indicates a scalar register storing DCT coefficients.
[0050]
Here, a vector register may be used for the DCT coefficient. However, since most of the coefficients on the high frequency side of the image data are often “0”, from the viewpoint of saving resources of the vector register, In this embodiment, writing to the scalar register can be performed only when a value other than “0” is indicated.
Therefore, the pipeline processing register on the operation unit is also operated only when data is written to the scalar register, so that power consumption can be reduced without saving resources and operating unnecessary circuits. it can.
[0051]
【The invention's effect】
As described above, according to the present invention, when a chaining path is required, there is no needless latch of a register included in the arithmetic unit, and it is possible to reduce resources and power consumption.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.
FIG. 2 is a circuit diagram showing a specific configuration of the chaining circuit of FIG.
[Explanation of symbols]
1-1 to 1-n are arithmetic units, 2 is a register write circuit, 3 is a register group, 4-1 to 4-n are register read units, 5-1 to 5-n are register read units, and 6 is register read. Circuit, 7 is a program control circuit, 11 and 12 are multiplexers, 13 and 14 are registers 13, 14, and 15 are operation blocks, 16 is a control block, 41 is a read register selection circuit, 42 is a chaining circuit, and 43 is a multiplexer. is there.

Claims

ベクトルデータとスカラデータをそれぞれ格納する複数からなるレジスタ群と、
ベクトルデータまたはスカラデータを用いて２つのデータ間の演算を行うｎ個の演算器と、
前記ｎ個の演算器の各演算結果を取り込み、チェイニングが必要なときにその各演算結果を所望の演算器の入力側に供給する２ｎ個のチェイニング回路とを備え、
前記各チェイニング回路は、前記チェイニングが可能なときに対応する演算器に対してチェイニング・イネーブル信号を生成するようになっており、
前記各演算器は、前記レジスタ群からのデータまたは前記ｎ個の演算器からの演算データのうちの１つを一時的に格納する記憶素子を含み、
前記記憶素子は、前記チェイニング・イネーブル信号がある場合にはそのチェイニングに係る演算器からの演算データを記憶するようになっていることを特徴とするベクトルプロセッサ。A plurality of registers for storing vector data and scalar data, respectively;
N arithmetic units for performing an operation between two data using vector data or scalar data,
2n chaining circuits that take in each operation result of the n operation units and supply the operation results to the input side of a desired operation unit when chaining is required,
Each of the chaining circuits is configured to generate a chaining enable signal for a corresponding arithmetic unit when the chaining is possible,
Each of the arithmetic units includes a storage element for temporarily storing one of the data from the register group or the arithmetic data from the n arithmetic units,
A vector processor, wherein the storage element stores operation data from an operation unit related to the chaining when there is the chaining enable signal.

第１演算器でベクトル演算の演算中に、第２演算器のスカラ演算の演算結果を使用できるベクトルプロセッサにおいて、
前記第１演算器のベクトル演算で使用しているスカラデータと、前記第２演算器の演算結果であるスカラデータと、を格納するレジスタが一致したか否かを検出し、一致したときにはチェイニング・イネーブル信号を生成するようにし、
前記チェイニング・イネーブル信号が生成された場合に、前記第２演算器からの出力データを前記第１演算器にパスさせて、その第１演算器が有するパイプラインレジスタにラッチさせるようにしたことを特徴とするベクトルプロセッサのデータ処理方法。In a vector processor which can use the operation result of the scalar operation of the second operation unit during the operation of the vector operation in the first operation unit,
It detects whether or not a register storing scalar data used in the vector operation of the first operation unit and scalar data which is an operation result of the second operation unit coincide with each other. Generating an enable signal,
When the chaining enable signal is generated, the output data from the second arithmetic unit is passed to the first arithmetic unit and latched in a pipeline register of the first arithmetic unit. A data processing method for a vector processor.