JP3594212B2

JP3594212B2 - Associative memory

Info

Publication number: JP3594212B2
Application number: JP11648397A
Authority: JP
Inventors: 剛池永; 武小倉
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-04-18
Filing date: 1997-04-18
Publication date: 2004-11-24
Anticipated expiration: 2017-04-18
Also published as: JPH10293768A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理システム等の様々な超並列型計算装置を構成する際に、超並列プロセッシングエレメント（ＰＥ）として用いる連想メモリに関する。
【０００２】
【従来の技術】
ネットワークサービスのビジュアル化、高付加価値化によって、高度な画像処理、音響処理、知識処理の必要性が高まっている。ところが、上記処理は、一般に莫大な処理性能が要求されるので、ノイマンアーキテクチャに基づいて、既存のマイクロプロセッサ、信号処理プロセッサを使用したのでは、実行困難な場合が多い。
【０００３】
そこで、上記処理に有効な装置として、種々の論理、算術演算処理を行うプロセッシングエレメント（ＰＥ）を多数搭載し、単一命令ストリーム・複数データストリーム方式（ＳＩＭＤ）によって、１つの制御回路から各ＰＥに対して、単一の命令列を与え、これによって、各ＰＥが同時に上記演算処理を実行することができる装置（超並列型計算装置）が知られている。
【０００４】
極めて少ないハード量によって、上記超並列型計算装置を実現することができる集積回路として、連想メモリが知られている。この連想メモリについては、「Ｏｇｕｒａ，Ｔ．ｅｔａｌ． ”Ａ２０−ｋｂｉｔＡｓｓｏｃｉａｔｅｂｉＭｅｍｏｒｙＬＳＩｆｏｒＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅＭａｃｈｉｎｅｓ”，ＩＥＥＥＪ．Ｓｏｌｉｄ−ＳｔａｔｅＣｉｒｃｕｉｔｓ，Ｖｏｌ．２４，Ｎｏ．４，ｐｐ．１０１４−１０２０Ａｕｇ．１９８９」に開示されている。
【０００５】
図４は、従来の連想メモリＭ０を示す図である。
【０００６】
従来の連想メモリＭ０は、アドレス検索マスクレジスタ４４と、データ検索マスクレジスタ４５と、書き込みマスクレジスタ４６と、アドレスデコーダ４７と、ヒットフラグレジスタ４１０と、ワード（読み出しのみ）４８と、ワード（読み書き可能）４９とを有する。
【０００７】
従来の連想メモリＭ０は、通常のメモリのように、アドレス入出力ポート４１にアドレス値を与えると、データ入出力ポート４２を介して、任意のワード４９にデータを読み書きできる機能を有するものである。
【０００８】
また、従来の連想メモリＭ０は、データ入出力ポート４２を介して与えられた検索データとワードの内容とを並列に照合し、一致したワードに対してヒットフラグを立てるマスク検索機能と、ヒットフラグが立っているワードに対して、データ入出力ポート４２を介して与えたデータを並列に書き込む並列部分書き込み機能とを有する。そして、これらのマスク検索機能、並列部分書き込み機能を用いることによって、ワード並列（ｗｏｒｄｐａｒａｌｌｅｌ）、ビット直列（ｂｉｔｓｅｒｉａｌ）に、種々のデータ転送、論理、算術演算処理を実行することができる。つまり、各ワードを、上記プロセッシングエレメントとして用いることができる。
【０００９】
また、処理対象ビットとフラグビット等に、検索マスク、書き込みマスクを設定し、マスク検索、並列部分書き込みを繰り返すことによって、上記データ転送、演算処理を実行する。
【００１０】
【発明が解決しようとする課題】
ところで、画像処理、音響処理、知識処理等の様々な超並列アルゴリズムの中には、多ビットのデータ転送、演算処理を行う必要があるものが多く存在する。たとえば、２５６階調の画像を処理するためには、８ビットのデータ転送、演算処理が不可欠である。連想メモリにおけるデータ転送、演算処理をビット直列で実行するので、一般に、これらの多ビットの処理時間は、大きな割合を占めるが、全体の処理時間を短くし、より多くのアプリケーションに適用可能な超並列型計算装置を実現するためには、上記のデータ転送、演算処理はできるだけ短い時間で終了することが望まれている。
【００１１】
従来の連想メモリＭ０においては、検索マスク、書き込みマスクを設定する場合、アドレス入出力ポート４１とデータ入出力ポート４２とを介して、マスク値を直接与えることによって、検索マスクと書き込みマスクとを設定するようにしている。したがって、従来の連想メモリＭ０において、多ビットの転送、演算を行う場合、ビット毎に、互いに独立したサイクルで、検索マスクと書き込みマスクとを設定し直す必要があり、このために、データ転送、演算処理時間が長いという問題がある。
【００１２】
本発明は、多ビットのデータ転送、演算処理時間が短い連想メモリを提供することを目的とするものである。
【００１３】
【課題を解決するための手段】
本発明は、データ検索マスクレジスタと、書き込みマスクレジスタと、ワード部と、アドレスデコーダと、ヒットフラグレジスタと、これらを制御する制御部とを有する連想メモリにおいて、上記データ検索マスクレジスタと上記書き込み検索マスクレジスタとをシフト動作させる手段と、上記検索マスクレジスタのシフト動作と上記ワード部に対する並列部分書き込み動作とを同時に実行させる手段と、上記書き込みマスクレジスタのシフト動作と上記ワード部に対するマスク検索動作とを同時に実行させる手段とを上記制御部が有するものである。
【００１４】
【発明の実施の形態および実施例】
図１は、本発明の一実施例である連想メモリＭ１の基本構成を示すブロック図である。
【００１５】
連想メモリＭ１は、アドレス検索マスクレジスタ１４と、データ検索マスクレジスタ１５、１６と、書き込みマスクレジスタ１７、１８と、アドレスデコーダ１９と、ワード（読み出しのみ）１１０と、ワード（読み書き可能）１１１と、ヒットフラグレジスタ１１２と、制御部１１３とを有する。
【００１６】
アドレス検索マスクレジスタ１４は、通常のレジスタによって構成され、シフト動作できないものである。データ検索マスクレジスタ１５は、シフトレジスタによって構成され、１ビットシフトアップ／ダウン動作が可能である。データ検索マスクレジスタ１６は、通常のレジスタによって構成され、シフト動作できないものである。書き込みマスクレジスタ１７は、シフトレジスタによって構成され、１ビットシフトアップ／ダウン動作が可能である。書き込みマスクレジスタ１８は、通常のレジスタによって構成され、シフト動作できないものである。
【００１７】
また、連想メモリＭ１は、マスク検索機能と、並列部分書き込み機能とを有する。
【００１８】
上記マスク検索機能は、データ入出力ポート１２を介して与えられた検索データとワードの内容とを並列に照合し、照合結果が一致したワードに対してヒットフラグを立てる機能であり、命令入力ポート１３を介して、そのモードを示す命令が与えられたときに、制御部１１３がその命令をデコードし、マスク検索起動制御線１１４を有効にし、検索マスクレジスタ１４、１５、１６、ワード１１０、１１１等を動作させることによって、上記デコードされた命令を実行させる機能である。
【００１９】
上記並列部分書き込み機能は、ヒットフラグが立っているワードに対して、データ入出力ポート１２を介して与えられたデータを並列に書き込む機能であり、命令入力ポート１３を介して、そのモードを示す命令が与えられたときに、制御部１１３が上記命令をデコードし、並列部分書き込み起動制御線１１６を有効にし、書き込みマスクレジスタ１７、１８、ワード１１０、１１１等を動作させることによって、上記デコードされた命令を実行させる機能である。
【００２０】
連想メモリＭ１は、上記マスク検索機能、上記並列部分書き込み機能を用いることによって、種々のデータ転送、論理、算術演算処理を、ワード並列（ｗｏｒｄｐａｒａｌｌｅｌ）、ビット直列（ｂｉｔｓｅｒｉａｌ）に実行することができる。
【００２１】
また、連想メモリＭ１は、上記データ検索マスクレジスタ１５におけるシフトアップ／ダウン動作と、ワード１１１に対する上記並列部分書き込み動作とを同時に行う機能を有する。この機能は、命令入力ポート１３を介して、そのモードを示す命令が与えられたときに、制御部１１３が上記命令をデコードし、検索マスクレジスタシフト起動制御線１１５と並列部分書き込み起動制御線１１６との両方を有効にすることによって、上記デコードされた命令を実行させる機能である。
【００２２】
さらに、連想メモリＭ１は、書き込みマスクレジスタ１７のシフトアップ／ダウンと、ワード１１１に対する上記マスク検索とを同時に行う機能を有する。この機能は、命令入力ポート１３を介して、そのモードを示す命令が与えられたときに、制御部１１３が上記命令をデコードし、書き込みマスクレジスタシフト起動制御線１１７とマスク検索起動制御線１１４との両方を有効にすることによって、上記デコードされた命令を実行させる機能である。
【００２３】
上記データ検索マスクレジスタ１５におけるシフト動作と並列部分書き込み動作とを同時に行う機能と、書き込みマスクレジスタ１７のシフト動作とマスク検索動作とを同時に行う機能とにおいて、シフトモード、マスク検索、並列部分書き込みモードのうちのいずれか１つの動作のみが選択されるので、各マスクレジスタ１４、１５、１６における動作が競合することがない。
【００２４】
次に、連想メモリＭ１を用いた場合におけるデータ転送の手順について説明する。
【００２５】
図２は、上記実施例におけるデータ転送手順を示すフローチャートである。なお、以下では、転送元フィールド２１のｎビット（ｎは任意の自然数）のデータを、転送先フィールド２２のｎビットに転送する手順を例にとって説明する。また、転送元フィールド２１、転送先フィールド２２を、シフト動作可能なマスクレジスタ１５、１７に対応するワードに割り付ける。
【００２６】
まず、上位１ビット目のデータの転送（Ｘ_１ →Ｙ_１）を実行する。つまり、サイクル１において検索マスク設定し、つまり、転送元フィールド２１の上位１ビット目（Ｘ_１）以外をマスクし、サイクル２において、書き込みマスク設定し、つまり、転送先フィールド２２の上位１ビット目（Ｙ_１）以外をマスクし、サイクル３において、転送元フィールド２１の上位１ビット目（Ｘ_１）に対してマスク検索を行い、ヒットフラグレジスタ１１１へデータを転送する。
【００２７】
そして、サイクル４において、転送先フィールド２２の上位１ビット目（Ｙ_１）に対して並列部分書き込みを実行し、ヒットフラグレジスタ１１１のデータをビットＹ_１へ転送し、これと同時に、検索マスクレジスタ１５をシフトダウンする。この検索マスクレジスタ１５をシフトダウンすることによって、転送元フィールド２１の２ビット目（Ｘ_２）以外がマスクされる。以上の手順によって、上位１ビット目のデータの転送（Ｘ_１ →Ｙ_１）が終了する。
【００２８】
次に、上位２ビット目のデータの転送（Ｘ_２ →Ｙ_２）を実行する。つまり、サイクル５において、転送元フィールド２１の上位２ビット目（Ｘ_２）に対してマスク検索を行い、ヒットフラグレジスタ１１１へデータを転送し、これと同時に、書き込みマスクレジスタ１７をシフトダウンする。この書き込みマスクレジスタ１７をシフトダウンすることによって、転送先フィールド２２の上位２ビット目（Ｙ_２）以外がマスクされる。そして、サイクル６において、転送先フィールド２２の上位２ビット目（Ｙ_２）に対して並列部分書き込みを行い、ヒットフラグレジスタ１１１のデータをビットＹ_２へ転送し、これと同時に検索マスクレジスタ１５をシフトダウンする。以上の手順によって、上位２ビット目のデータの転送（Ｘ_２ →Ｙ_２）が終了する。
【００２９】
そして、ｎビットの全てに対して、上記手順と同様の手順を実行することによって、ｎビットのデータ転送（Ｘ_ｎ →Ｙ_ｎ）が終了する。
【００３０】
上記のように、連想メモリＭ１を用いることによって、２ビット目から、検索／書き込みマスクの設定と、マスク検索、並列部分書き込みとを同時に実行することができ、したがって、２ｎ＋２サイクルで、ｎビットの転送を終了させることができる。
【００３１】
一方、従来の連想メモリでは、全てのビットに対して、検索、書き込みマスクを、互いに独立のサイクルで設定する必要があるので、ｎビットを転送する場合、４ｎサイクルが必要になる。したがって、たとえば８ビットを転送する場合、上記実施例である連想メモリＭ１を用いると、１８サイクルの処理で足りるが、従来の連想メモリＭ０を用いると、３２サイクルの処理が必要になり、上記実施例では、従来例の５６％の時間で転送処理を終了させることができる。
【００３２】
次に、連想メモリＭ１を用いた場合における演算処理の手順を説明する。
【００３３】
図３は、上記実施例における加算処理の手順を示すフローチャートである。なお、ここでは、被加算更新フィールド３１のｎビット（ｎは任意の自然数）のデータと、加算フィールド３２のｎビットのデータとを加算し、被加算更新フィールド３１に格納する場合を例にとって説明する。
【００３４】
被加算更新フィールド３１、加算フィールド３２は、シフト動作可能なマスクレジスタ１５、１７に対応するワードに割り付けるフィールドであり、キャリーデータを格納するキャリーフィールド（Ｃ）３３は、シフト動作不可能なマスクレジスタ１６、１８に対応するワードに割り付けるフィールドである。
【００３５】
まず、下位１ビット目の加算処理（Ｘ_１＋Ｙ_１ →Ｘ_１）を実行する。つまり、サイクル１において、検索マスク設定によって、被加算更新フィールド３１の下位１ビット目（Ｘ_１）、加算フィールド３２の下位１ビット目（Ｙ_１）、キャリーフィールド３３以外をマスクし、サイクル２において、書き込みマスク設定によって、被加算更新フィールド３１の下位１ビット目（Ｘ_１）、キャリーデータを格納するキャリーフィールド３３以外をマスクする。
【００３６】
そして、サイクル３において、（Ｘ_１，Ｙ_１，Ｃ）＝（０，０，１）のものをマスク検索する。つまり、Ｘ_１＝０、Ｙ_１＝０、Ｃ（キャリーフィールド３３）＝１を満足するワードを探し、このワードにヒットフラグを立てる。次に、サイクル４において、（Ｘ_１、Ｃ）＝（１，０）で並列部分書き込みを行う。つまり、Ｘ_１＝０、Ｃ＝０になるように書き込む。また、サイクル５において、（Ｘ_１，Ｙ_１，Ｃ）＝（１，０，１）であるワードをマスク検索し、サイクル６において、（Ｘ_１，Ｃ）＝（０，１）で並列部分書き込みを行い、つまり、Ｘ_１＝０、Ｃ＝１になるように書き込む。そして、サイクル７において、（Ｘ_１，Ｙ_１，Ｃ）＝（１，１，０）のものをマスク検索し、サイクル８において、（Ｘ_１，Ｃ）＝（０，１）で並列部分書き込みをし、サイクル９において、（Ｘ_１，Ｙ_１，Ｃ）＝（０，１，０）のものをマスク検索し、サイクル１０において、（Ｘ_１，Ｃ）＝（１，０）で並列部分書き込みを行い、これと同時に、検索マスクレジスタ１５をシフトアップする。
【００３７】
検索マスクレジスタ１５の上記シフトアップによって、被加算更新フィールド３１の下位２ビット目（Ｘ_２）、加算フィールド３２の下位２ビット目（Ｙ_２）以外がマスクされる。なお、キャリーフィールド３３は、シフトしない検索マスクレジスタ１６に割り付けられているので、マスク位置が動かない。以上の手順によって、下位１ビット目の加算処理（Ｘ_１＋Ｙ_１ →Ｘ_１）が終了する。
【００３８】
なお、下位１ビット目（Ｘ_１）と下位１ビット目（Ｙ_１）とを加算する場合、（Ｘ_１，Ｙ_１，Ｃ）が、（０，１，１）、（１，１，１）、（０，０，０）、（１，０，０）であるワードについてはマスク検索しない。これは、それらについて加算したとしても、下位１ビット目（Ｘ_１）の値が変化しないので、加算を行う必要がなく、したがって、それらについてはマスク検索しない。下位１ビット目以外のビットについて加算する場合も上記と同様である。
【００３９】
次に、下位２ビット目の加算処理（Ｘ_２＋Ｙ_２＋Ｃ→Ｘ_２）を実行する。つまり、サイクル１１において、（Ｘ_２，Ｙ_２，Ｃ）＝（０，０，１）のビットをマスク検索し、これと同時に、マスクレジスタ１５、１７をシフトアップする。書き込みマスクレジスタ１５、１７の上記シフトアップによって、被加算更新フィールド３１の下位２ビット目（Ｘ_２）以外がマスクされる。キャリーフィールド３３は、シフトしない書き込みマスクレジスタ１８に割り付けられているので、マスク位置が動かない。そして、サイクル１２において、（Ｘ_２，Ｃ）＝（１，０）で並列部分書き込みを行う。以下、上記のサイクル４からサイクル１０と同様の処理を行う。以上の手順によって、下位２ビット目の加算処理（Ｘ_２＋Ｙ_２＋Ｃ→Ｘ_２）が終了される。
【００４０】
そして、ｎビットの全てに対して、上記手順と同様の手順を実行することによって、ｎビットの加算処理を行うことができる。
【００４１】
上記のように、連想メモリＭ１を用いることによって、２ビット目から、検索／書き込みマスクの設定とマスク検索、並列部分書き込みとを同時に実行することができ、ｎビットの加算を、８ｎ＋２サイクルで終了することができる。
【００４２】
一方、従来の連想メモリＭ０では、全てのビットに対して、検索、書き込みマスクを独立のサイクルで設定する必要があるので、ｎビットを加算するには、１０ｎサイクル必要になる。したがって、たとえば８ビットでは、上記実施例である連想メモリＭ１を用いた場合、６６サイクルの処理で足りるが、従来の連想メモリＭ０を用いた場合、８０サイクルの処理が必要になり、上記実施例では、従来例の８３％の時間で加算処理を終了することができる。
【００４３】
すなわち、上記のデータ転送、演算処理において、２ビット目以降の処理は、前に処理したビットの１ビット上位または下位のビットになる。よって、次の処理ビットの検索、書き込みマスク位置は、前の処理ビットのマスクレジスタの値をシフトアップまたはシフトダウンさせたものになる。したがって、検索マスクレジスタ１５、書き込みマスクレジスタ１７を、シフト動作可能なレジスタにし、マスク検索、並列部分書き込みとシフト動作とを同時に行うモード手段を設け、２ビット目からは、上記モード手段を用いることによって、従来、検索、書き込みマスクを設定するために独立に要していた時間を必要としなくなるので、多ビットのデータ転送、演算処理の高速化が可能になる。
【００４４】
一方、演算処理において、フラグビット等の位置が固定されているので、マスクレジスタをシフトすると、正しく処理できないものが存在する。したがって、マスクレジスタのうちでシフトしないレジスタを設け、フラグビット等に対しては、上記シフトしないレジスタに格納するようにする。このようにすることによって、上記のような演算処理を適用することができる。
【００４５】
つまり、連想メモリＭ１は、データ検索マスクレジスタと、書き込みマスクレジスタと、ワード部と、アドレスデコーダと、ヒットフラグレジスタと、これらを制御する制御部とを有する連想メモリにおいて、上記データ検索マスクレジスタと上記書き込み検索マスクレジスタとをシフト動作させる手段と、上記検索マスクレジスタのシフト動作と上記ワード部に対する並列部分書き込み動作とを同時に実行させる手段と、上記書き込みマスクレジスタのシフト動作と上記ワード部に対するマスク検索動作とを同時に実行させる手段とを、上記制御部が有する連想メモリである。このようにすることによって、図２に示したデータ転送処理が実現される。
【００４６】
なお、上記ワード部に対する並列部分書き込み動作とを同時に実行させる手段に対応する機能と、上記書き込みマスクレジスタのシフト動作と上記ワード部に対するマスク検索動作とを同時に実行させる手段に対応する機能とを、制御部１１３が有している。
【００４７】
また、連想メモリＭ１において、上記データ検索マスクレジスタと上記書き込み検索マスクレジスタとをシフト動作させる手段は、一部のビットに対してのみシフト動作させる手段であり、このようにすることによって、一部シフト不可能のビットが存在するので、これをキャリー用のビットに充当すれば、図３に示す加算処理が実現される。
【００４８】
【発明の効果】
本発明によれば、検索、書き込みマスクレジスタ、ワード、アドレスデコーダ、ヒットフラグレジスタを有する連想メモリにおいて、上記検索、書き込みマスクレジスタの一部をシフト動作可能なレジスタによって構成し、上記検索マスクレジスタのシフトとワードに対する並列部分書き込みとを同時に行うモード手段と、上記書き込みマスクレジスタのシフトとワードに対するマスク検索とを同時に行うモード手段とを設けたので、多ビットのデータ転送、演算処理時間が短いという効果を奏する。
【図面の簡単な説明】
【図１】本発明の一実施例である連想メモリＭ１の基本構成のブロック図である。
【図２】上記実施例におけるデータ転送手順を示すフローチャートである。
【図３】上記実施例における加算処理の手順を示すフローチャートである。
【図４】従来の連想メモリＭ０を示す図である。
【符号の説明】
Ｍ１…連想メモリ、
１４…アドレス検索マスクレジスタ、
１５、１６…データ検索マスクレジスタ、
１７、１８…書き込みマスクレジスタ、
１９…アドレスデコーダ、
１１０…ワード（読み出しのみ）、
１１１…ワード（読み書き可能）、
１１２…ヒットフラグレジスタ、
１１３…制御部、
２１…転送元フィールド、
２２…転送先フィールド、
３１…被加算更新フィールド、
３２…加算フィールド、
３３…キャリーフィールド。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an associative memory used as a massively parallel processing element (PE) when configuring various massively parallel computing devices such as an image processing system.
[0002]
[Prior art]
With the visualization and added value of network services, the need for advanced image processing, sound processing, and knowledge processing is increasing. However, since the above processing generally requires an enormous processing performance, it is often difficult to execute the processing using an existing microprocessor or signal processing processor based on the Neumann architecture.
[0003]
Therefore, as a device effective for the above processing, a large number of processing elements (PEs) for performing various logic and arithmetic operations are mounted, and a single instruction stream / multiple data stream system (SIMD) is used to control each PE from one control circuit. For example, a device (a massively parallel computing device) is known in which a single instruction sequence is provided to each PE so that each PE can execute the above-described arithmetic processing at the same time.
[0004]
2. Description of the Related Art An associative memory is known as an integrated circuit that can realize the above massively parallel computing device with a very small amount of hardware. This associative memory is described in “Ogura, T. et al.” “A20-kbit Associate Associate LSI for Artificial Intelligence Machines”, IEEE J. Solid, Stat. 1989 ".
[0005]
FIG. 4 is a diagram showing a conventional associative memory M0.
[0006]
The conventional associative memory M0 includes an address search mask register 44, a data search mask register 45, a write mask register 46, an address decoder 47, a hit flag register 410, a word (read only) 48, and a word (read / write enabled). ) 49.
[0007]
The conventional associative memory M0 has a function of reading and writing data in an arbitrary word 49 via the data input / output port 42 when an address value is given to the address input / output port 41, like a normal memory. .
[0008]
Further, the conventional associative memory M0 collates search data given via the data input / output port 42 with the contents of a word in parallel, sets a hit flag for a matched word, and a hit flag. Has a parallel partial write function of writing data given via the data input / output port 42 in parallel to the word on which the word is set. By using the mask search function and the parallel partial write function, it is possible to execute various data transfer, logic, and arithmetic processing in word parallel and bit serial. That is, each word can be used as the processing element.
[0009]
The data transfer and the arithmetic processing are executed by setting a search mask and a write mask for the processing target bit and the flag bit, and repeating the mask search and the parallel partial writing.
[0010]
[Problems to be solved by the invention]
Meanwhile, among various massively parallel algorithms such as image processing, sound processing, and knowledge processing, there are many that need to perform multi-bit data transfer and arithmetic processing. For example, in order to process a 256-tone image, 8-bit data transfer and arithmetic processing are indispensable. Since data transfer and arithmetic processing in the associative memory are performed in a bit-serial manner, generally, the processing time of these multi-bits occupies a large proportion. In order to realize a parallel computing device, it is desired that the above data transfer and arithmetic processing be completed in as short a time as possible.
[0011]
In the conventional associative memory M0, when a search mask and a write mask are set, the search mask and the write mask are set by directly providing a mask value via the address input / output port 41 and the data input / output port 42. I am trying to do it. Therefore, when performing multi-bit transfer and operation in the conventional associative memory M0, it is necessary to reset the search mask and the write mask in a cycle independent of each other for each bit. There is a problem that the arithmetic processing time is long.
[0012]
SUMMARY OF THE INVENTION It is an object of the present invention to provide an associative memory in which multi-bit data transfer and operation processing time are short.
[0013]
[Means for Solving the Problems]
The present invention relates to an associative memory having a data search mask register, a write mask register, a word section, an address decoder, a hit flag register, and a control section that controls the data search mask register and the write search register. Means for performing a shift operation on the mask register; means for simultaneously performing the shift operation on the search mask register and the parallel partial write operation on the word part; the shift operation on the write mask register and the mask search operation on the word part. Are simultaneously executed by the control unit.
[0014]
Embodiments and Examples of the Invention
FIG. 1 is a block diagram showing a basic configuration of an associative memory M1 according to one embodiment of the present invention.
[0015]
The associative memory M1 includes an address search mask register 14, data search mask registers 15, 16, write mask registers 17, 18, an address decoder 19, a word (read only) 110, a word (read / write) 111, It has a hit flag register 112 and a control unit 113.
[0016]
The address search mask register 14 is composed of a normal register and cannot perform a shift operation. The data search mask register 15 is constituted by a shift register, and is capable of 1-bit shift up / down operation. The data search mask register 16 is a normal register and cannot perform a shift operation. The write mask register 17 is constituted by a shift register, and is capable of performing a 1-bit shift up / down operation. The write mask register 18 is a normal register and cannot perform a shift operation.
[0017]
The associative memory M1 has a mask search function and a parallel partial write function.
[0018]
The mask search function is a function of collating the search data provided via the data input / output port 12 with the contents of the word in parallel, and setting a hit flag for the word having the matching result. When an instruction indicating the mode is given via the control unit 13, the control unit 113 decodes the instruction, activates the mask search activation control line 114, and searches the search mask registers 14, 15, 16, words 110, 111. And the like to execute the decoded instruction.
[0019]
The parallel partial write function is a function for writing data given via the data input / output port 12 in parallel to a word for which a hit flag is set, and indicates the mode via the instruction input port 13. When an instruction is given, the control unit 113 decodes the instruction, activates the parallel partial write activation control line 116, and operates the write mask registers 17, 18, words 110, 111, etc. This is the function to execute the instruction.
[0020]
By using the mask search function and the parallel partial write function, the associative memory M1 can execute various data transfer, logic, and arithmetic processing in a word parallel manner and a bit serial manner. it can.
[0021]
The associative memory M1 has a function of simultaneously performing the shift-up / down operation in the data search mask register 15 and the parallel partial write operation for the word 111. This function is such that, when an instruction indicating the mode is given via the instruction input port 13, the control unit 113 decodes the instruction, and executes a search mask register shift activation control line 115 and a parallel partial write activation control line 116. Is a function to execute the decoded instruction by validating both.
[0022]
Further, the associative memory M1 has a function of simultaneously shifting up / down the write mask register 17 and performing the above-described mask search for the word 111. This function is such that when an instruction indicating the mode is given via the instruction input port 13, the control unit 113 decodes the instruction, and the write mask register shift activation control line 117, the mask search activation control line 114 Is a function for executing the decoded instruction by making both of them effective.
[0023]
A shift mode, a mask search, and a parallel partial write mode in the function of simultaneously performing the shift operation and the parallel partial write operation in the data search mask register 15 and the function of simultaneously performing the shift operation and the mask search operation in the write mask register 17 are described. Since only one of the operations is selected, the operations in the mask registers 14, 15, and 16 do not conflict with each other.
[0024]
Next, a data transfer procedure when the associative memory M1 is used will be described.
[0025]
FIG. 2 is a flowchart showing a data transfer procedure in the above embodiment. In the following, a procedure for transferring data of n bits (n is an arbitrary natural number) of the transfer source field 21 to n bits of the transfer destination field 22 will be described as an example. Further, the transfer source field 21 and the transfer destination field 22 are assigned to words corresponding to the mask registers 15 and 17 that can perform the shift operation.
[0026]
First, transfer of data of the first upper bit (X ₁ → Y ₁ ) is executed. That is, a search mask is set in cycle 1, that is, a portion other than the upper first bit (X ₁ ) of the source field 21 is masked, and a write mask is set in cycle 2, that is, the upper first bit of the destination field 22. The mask other than (Y ₁ ) is masked, and in cycle 3, a mask search is performed on the upper first bit (X ₁ ) of the transfer source field 21 and the data is transferred to the hit flag register 111.
[0027]
Then, in cycle 4, running parallel partial write to the upper 1 bit of the destination field 22 (Y _1), and transfers the data of the hit flag register 111 to bit Y _1, and at the same time, the search mask register Shift down 15. By shifting down the search mask register 15, bits other than the second bit (X ₂ ) of the transfer source field 21 are masked. Through the above procedure, the transfer of the data of the first upper bit (X ₁ → Y ₁ ) is completed.
[0028]
Next, transfer of data of the upper 2 bits (X ₂ → Y ₂ ) is executed. That is, in cycle 5, a mask search is performed on the upper second bit (X ₂ ) of the transfer source field 21, data is transferred to the hit flag register 111, and at the same time, the write mask register 17 is shifted down. By shifting down the write mask register 17, a portion other than the upper second bit (Y ₂ ) of the transfer destination field 22 is masked. Then, in cycle 6 performs parallel partial write for the upper second bit of the destination field 22 (Y _2), transfers the data of the hit flag register 111 to the bit Y _2, at the same time the search mask register 15 Shift down. With the above procedure, the transfer of the data of the second upper bit (X ₂ → Y ₂ ) is completed.
[0029]
Then, by executing a procedure similar to the above procedure for all of the n bits, the n-bit data transfer (X _n → Y _n ) is completed.
[0030]
As described above, by using the associative memory M1, the setting of the search / write mask, the mask search, and the parallel partial write can be simultaneously executed from the second bit. Therefore, in 2n + 2 cycles, n bits can be obtained. The transfer can be terminated.
[0031]
On the other hand, in the conventional associative memory, it is necessary to set the search and write masks for all the bits in cycles independent of each other. Therefore, when transferring n bits, 4n cycles are required. Therefore, for example, when transferring 8 bits, 18 cycles of processing are sufficient if the associative memory M1 of the above embodiment is used. However, if the conventional associative memory M0 is used, 32 cycles of processing are required. In the example, the transfer processing can be completed in 56% of the time in the conventional example.
[0032]
Next, the procedure of the arithmetic processing when the associative memory M1 is used will be described.
[0033]
FIG. 3 is a flowchart showing the procedure of the addition process in the above embodiment. Here, an example will be described in which n-bit (n is an arbitrary natural number) data of the augmented / updated field 31 and n-bit data of the augmented field 32 are added and stored in the augmented / updated field 31. I do.
[0034]
The added / updated field 31 and the added field 32 are fields assigned to words corresponding to the mask registers 15 and 17 that can perform shift operations. The carry field (C) 33 that stores carry data is a mask register that cannot perform shift operations. Fields assigned to words corresponding to 16 and 18.
[0035]
First, the addition processing of the lower first bit (X ₁ + Y ₁ → X ₁ ) is executed. In other words, in cycle 1, except for the lower first bit (X ₁ ) of the added / updated field 31, the lower first bit (Y ₁ ) of the added field 32, and the carry field 33, the search mask is set. The write mask setting masks portions other than the lower first bit (X ₁ ) of the augmented / updated field 31 and the carry field 33 for storing carry data.
[0036]
Then, in cycle _3, masks find ones _{(X 1, Y 1, C} ) = (0,0,1). That is, a word that satisfies X ₁ = 0, Y ₁ = 0, and C (carry field 33) = 1 is searched, and a hit flag is set for this word. Next, in cycle _4, the parallel portions writing (X 1, C) = ( 1,0). That is, writing is performed so that X ₁ = 0 and C = 0. Further, in cycle 5, a mask search is performed for a word in which (X ₁ , Y ₁ , C) = ( ₁ , 0, 1), and in cycle 6, a parallel part is obtained by (X ₁ , C) = (0, 1). Writing is performed, that is, writing is performed so that X ₁ = 0 and C = 1. Then, in cycle _7, the parallel partial writing in _{(X 1, Y 1, C} ) = a masked search ones (1,1,0), in cycle _{8, (X 1, C)} = (0,1) In cycle 9, a mask search is performed for (X ₁ , Y ₁ , C) = (0, ₁ , 0), and in cycle 10, the parallel part is obtained by (X ₁ , C) = (1, 0). Writing is performed, and at the same time, the search mask register 15 is shifted up.
[0037]
Due to the above-described shift-up of the search mask register 15, portions other than the lower-order second bit (X ₂ ) of the add-update field 31 and the lower-order second bit (Y ₂ ) of the addition field 32 are masked. Since the carry field 33 is allocated to the non-shifted search mask register 16, the mask position does not move. With the above procedure, the lower-order 1-bit addition processing (X ₁ + Y ₁ → X ₁ ) is completed.
[0038]
In addition, when adding the lower first bit (X ₁ ) and the lower first bit (Y ₁ ), (X ₁ , Y ₁ , C) becomes (0, 1, 1), (1, 1, 1) ), (0,0,0), and (1,0,0) are not mask-searched. This means that even if they are added, since the value of the lower first bit (X ₁ ) does not change, there is no need to perform addition, and therefore, no mask search is performed for them. The same applies to the case where addition is performed for bits other than the lower first bit.
[0039]
Next, the addition processing of the lower 2 bits (X ₂ + Y ₂ + C → X ₂ ) is executed. That is, in the cycle 11, the bit of (X ₂ , Y ₂ , C) = (0, 0, 1) is searched for a mask, and at the same time, the mask registers 15 and 17 are shifted up. Due to the above-described shift-up of the write mask registers 15 and 17, portions other than the lower-order second bit (X ₂ ) of the added / updated field 31 are masked. Since the carry field 33 is allocated to the non-shifted write mask register 18, the mask position does not move. Then, in cycle 12, parallel partial writing is performed with (X ₂ , C) = (1, 0). Hereinafter, the same processing as the above-described cycle 4 to cycle 10 is performed. With the above procedure, the addition processing of the lower 2 bits (X ₂ + Y ₂ + C → X ₂ ) is completed.
[0040]
Then, by performing the same procedure as the above procedure for all n bits, the addition processing of n bits can be performed.
[0041]
As described above, by using the associative memory M1, the setting of the search / write mask, the mask search, and the parallel partial writing can be simultaneously executed from the second bit, and the addition of n bits is completed in 8n + 2 cycles. can do.
[0042]
On the other hand, in the conventional associative memory M0, since it is necessary to set the search and write masks for all the bits in independent cycles, it takes 10n cycles to add n bits. Therefore, for example, when using the associative memory M1 of the above-described embodiment, processing of 66 cycles is sufficient for 8 bits, but when using the conventional associative memory M0, processing of 80 cycles is required. Then, the addition process can be completed in 83% of the time of the conventional example.
[0043]
That is, in the above-described data transfer and arithmetic processing, the processing of the second and subsequent bits is one bit higher or lower than the previously processed bit. Therefore, the search / write mask position of the next processing bit is obtained by shifting up or down the value of the mask register of the previous processing bit. Therefore, the search mask register 15 and the write mask register 17 are made shiftable registers, and a mode means for simultaneously performing the mask search, the parallel partial writing and the shift operation is provided, and the mode means is used from the second bit. As a result, the time required independently for setting a search and a write mask is no longer required, so that multi-bit data transfer and arithmetic processing can be speeded up.
[0044]
On the other hand, in arithmetic processing, since the position of a flag bit or the like is fixed, if the mask register is shifted, there are some which cannot be correctly processed. Therefore, a non-shifting register is provided among the mask registers, and a flag bit or the like is stored in the non-shifting register. By doing so, the above-described arithmetic processing can be applied.
[0045]
That is, the associative memory M1 is an associative memory having a data search mask register, a write mask register, a word section, an address decoder, a hit flag register, and a control section that controls these. Means for performing a shift operation on the write search mask register, means for simultaneously performing the shift operation on the search mask register and the parallel partial write operation on the word portion, a shift operation on the write mask register and a mask on the word portion. The associative memory of the control unit includes means for simultaneously executing the search operation. By doing so, the data transfer processing shown in FIG. 2 is realized.
[0046]
A function corresponding to means for simultaneously executing the parallel partial write operation on the word part and a function corresponding to means for simultaneously executing the shift operation of the write mask register and the mask search operation on the word part are described below. The control unit 113 has it.
[0047]
Also, in the associative memory M1, the means for shifting the data search mask register and the write search mask register is means for shifting only some of the bits. Since there are bits that cannot be shifted, if these bits are applied to carry bits, the addition process shown in FIG. 3 is realized.
[0048]
【The invention's effect】
According to the present invention, in an associative memory having a search and write mask register, a word, an address decoder, and a hit flag register, a part of the search and write mask register is configured by a register capable of performing a shift operation. Since the mode means for simultaneously performing the shift and the parallel partial writing for the word and the mode means for simultaneously performing the shift of the write mask register and the mask search for the word are provided, the time required for multi-bit data transfer and calculation processing is short. It works.
[Brief description of the drawings]
FIG. 1 is a block diagram of a basic configuration of an associative memory M1 according to an embodiment of the present invention.
FIG. 2 is a flowchart showing a data transfer procedure in the embodiment.
FIG. 3 is a flowchart illustrating a procedure of an addition process in the embodiment.
FIG. 4 is a diagram showing a conventional associative memory M0.
[Explanation of symbols]
M1: associative memory,
14 ... Address search mask register,
15, 16 ... data search mask register,
17, 18 ... write mask register,
19 ... Address decoder,
110 ... word (read only),
111 ... word (readable and writable),
112 ... hit flag register,
113 ... control unit,
21: transfer source field,
22 ... destination field,
31 ... augmented update field,
32 ... addition field,
33 ... Carry field.

Claims

データ検索マスクレジスタと、書き込みマスクレジスタと、ワード部と、アドレスデコーダと、ヒットフラグレジスタと、これらを制御する制御部とを有する連想メモリにおいて、
上記データ検索マスクレジスタと上記書き込み検索マスクレジスタとをシフト動作させる手段と；
上記検索マスクレジスタのシフト動作と、上記ワード部に対する並列部分書き込み動作とを同時に実行させる手段と；
上記書き込みマスクレジスタのシフト動作と、上記ワード部に対するマスク検索動作とを同時に実行させる手段と；
を上記制御部が有することを特徴とする連想メモリ。In an associative memory having a data search mask register, a write mask register, a word section, an address decoder, a hit flag register, and a control section for controlling these,
Means for shifting the data search mask register and the write search mask register;
Means for simultaneously executing a shift operation of the search mask register and a parallel partial write operation to the word portion;
Means for simultaneously executing the shift operation of the write mask register and the mask search operation for the word portion;
The associative memory, characterized in that the control unit has:

請求項１において、
上記データ検索マスクレジスタと上記書き込み検索マスクレジスタとをシフト動作させる手段は、一部のビットに対してのみシフトさせる手段であることを特徴とする連想メモリ。In claim 1,
An associative memory, wherein the means for shifting the data search mask register and the write search mask register is means for shifting only some of the bits.