JP2002312237A

JP2002312237A - Processor

Info

Publication number: JP2002312237A
Application number: JP2001112448A
Authority: JP
Inventors: Harutaka Goto; 藤治隆後; Kenju Osanai; 建樹小山内
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2001-04-11
Filing date: 2001-04-11
Publication date: 2002-10-25

Abstract

PROBLEM TO BE SOLVED: To provide a processor capable of shortening the latency of a loading/ storing command. SOLUTION: This processor is provided with a general purpose register 1, adder 2 for calculating virtual addresses, TLB 3 for converting a virtual address into a physical address, D-Cache 4, D-tag 5, data memory 6 whose capacity is smaller than that of the D-Cache 4 and whose speed is higher than that of the D-Cache 4, comparator 9 or comparing the output of the TLB 3 with the output of the D-tag 5 and comparator 10 for comparing the data read out from the D-Cache 4 with the data read out from the data memory 6. Since in the execution of a load command, the input of the succeeding command is executed using the data read out from the data memory 6 without waiting the hit detection results of the D-Cache D and the data memory 6, the latency of the loading command can be shortened, resulting in the improvement of the processor.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、キャッシュメモリ
を内蔵し、かつロード／ストア命令を実行可能なプロセ
ッサに関する。The present invention relates to a processor having a built-in cache memory and capable of executing load / store instructions.

【０００２】[0002]

【従来の技術】近年、マイクロプロセッサの動作周波数
は著しく高くなってきている。ところが、半導体メモリ
の動作周波数はそれほど上がっていないため、システム
クロックの１サイクルでキャッシュメモリにアクセスす
るのが困難になってきた。その結果、ロード命令の実行
に要するサイクル数（以下、レイテンシー）が以前より
長くなりつつある。2. Description of the Related Art In recent years, the operating frequency of microprocessors has been significantly increased. However, since the operating frequency of the semiconductor memory has not increased so much, it has become difficult to access the cache memory in one cycle of the system clock. As a result, the number of cycles required to execute the load instruction (hereinafter, latency) is becoming longer than before.

【０００３】多くのプロセッサでは、ロード命令はＡＤ
Ｄ命令等の算術演算命令に比べてレイテンシーが長い。
したがって、ロード命令の結果をそのロード命令の直後
に用いる命令がある場合、パイプラインのインターロッ
ク（ストール）なしにはその命令を正しく実行できな
い。これをロード命令のディレイスロットと呼ぶ。[0003] In many processors, the load instruction is AD
Latency is longer than that of an arithmetic operation instruction such as a D instruction.
Therefore, if there is an instruction that uses the result of the load instruction immediately after the load instruction, the instruction cannot be correctly executed without pipeline interlock (stall). This is called a load instruction delay slot.

【０００４】一般に、ロード命令の結果は、そのロード
命令の直後に実行されるロード命令や算術演算命令で参
照される可能性が高い。このため、ロード命令のレイテ
ンシーが長くなるほど、パイプライン中にバブルが生じ
やすく、プロセッサの性能を低下させてしまう。Generally, the result of a load instruction is likely to be referred to by a load instruction or an arithmetic operation instruction executed immediately after the load instruction. For this reason, as the latency of the load instruction becomes longer, bubbles easily occur in the pipeline, and the performance of the processor is reduced.

【０００５】このような性能低下を抑制するために、ソ
フトウェアでロード命令のディレイスロットを埋める努
力もなされているが、そのディレイスロットをソフトウ
ェアで埋め尽くすのは一般には困難である。このため、
ロード命令のレイテンシーを短くすることがプロセッサ
の性能上、非常に重要である。[0005] In order to suppress such performance degradation, efforts have been made to fill delay slots of load instructions with software, but it is generally difficult to fill the delay slots with software. For this reason,
Reducing the latency of load instructions is very important for the performance of the processor.

【０００６】[0006]

【発明が解決しようとする課題】図９はロード命令のレ
イテンシーが「２」であるパイプラインの一例を示して
いる。図９のパイプラインは、Ｉステージで命令フェッ
チ、Ｑステージで命令のステージング、Ｒステージで汎
用レジスタの読み出し、Ａステージで算術演算実行とロ
ード命令のアドレス計算、Ｄステージでデータキャッシ
ュアクセスとデータフェッチ、Ｗステージで汎用レジス
タ書き込みを行う。命令間のＲＡＷハザードがあった場
合は、ＷステージからＡステージのバイパスを許可する
ものとする。ｌｗ（ロードワード）命令とａｄｄ命令は
相互にＲＡＷハザードを持っており、図中の小文字で表
示されたステージがパイプラインストールとなってパイ
プライン中にバブルが生成される。FIG. 9 shows an example of a pipeline in which the latency of a load instruction is "2". The pipeline shown in FIG. 9 includes an instruction fetch at the I stage, an instruction staging at the Q stage, a general register read at the R stage, an arithmetic operation execution and a load instruction address calculation at the A stage, and a data cache access and data fetch at the D stage. , The general-purpose register is written in the W stage. If there is a RAW hazard between instructions, the bypass from the W stage to the A stage is permitted. The lw (load word) instruction and the add instruction have a RAW hazard mutually, and the stage indicated by a lowercase letter in the figure becomes a pipeline stall, and a bubble is generated in the pipeline.

【０００７】しかしながら、データフェッチに費やされ
るサイクルがＤステージの１サイクルだけと非常に短い
ため、パイプライン中のバブルも多くない。However, since the cycle spent for data fetch is very short, only one cycle of the D stage, there are not many bubbles in the pipeline.

【０００８】図１０はロード命令のレイテンシーが
「３」であるパイプラインの一例を示している。図１０
では、ＤステージがＤ１ステージとＤ２ステージの２サ
イクルに分かれており、図９よりもパイプライン中のバ
ブルが多くなる。FIG. 10 shows an example of a pipeline in which the latency of a load instruction is "3". FIG.
In this case, the D stage is divided into two cycles of the D1 stage and the D2 stage, and the number of bubbles in the pipeline increases as compared with FIG.

【０００９】これらの例からもわかるように、ロード命
令のレイテンシーが短いことは、プロセッサの性能上、
非常に重要である。[0009] As can be seen from these examples, the short latency of the load instruction is a factor in the performance of the processor.
Very important.

【００１０】図１１は従来のプロセッサのロードストア
命令実行ユニットの概略構成を示すブロック図である。
図１１は、ＤステージがＤ１〜Ｄ４の４つのステージに
分かれている例を示している。FIG. 11 is a block diagram showing a schematic configuration of a load store instruction execution unit of a conventional processor.
FIG. 11 shows an example in which the D stage is divided into four stages D1 to D4.

【００１１】まず、Ｒステージで汎用レジスタ（ＧＰ
Ｒ）１からベースアドレスが読み出され、次のＡステー
ジで、加算器２にて仮想アドレスの計算が行われる。計
算された仮想アドレスは、次のＤ１ステージで、ＴＬＢ
３に供給されるとともに、データタグ（D-tag）５用の
ストアバッファ７に格納される。First, in the R stage, a general-purpose register (GP
R) The base address is read from 1, and in the next A stage, the adder 2 calculates the virtual address. The calculated virtual address is stored in the TLB in the next D1 stage.
3 and stored in a store buffer 7 for a data tag (D-tag) 5.

【００１２】ＴＬＢ３は仮想アドレスを物理アドレスに
変換する処理を行い、変換した物理アドレスをＤ３ステ
ージで出力する。データキャッシュ（D-Cache）４は、
Ｄ１〜Ｄ３の３サイクルを費やしてデータの書き込みと
読み出しを行う。D-Cache４に書き込まれるデータは、
いったんストアバッファ８に格納される。D-Cache４か
ら読み出されたデータはＷステージで汎用レジスタに書
き戻される。The TLB 3 performs a process of converting a virtual address to a physical address, and outputs the converted physical address in a D3 stage. Data cache (D-Cache) 4
Data writing and reading are performed using three cycles D1 to D3. The data written to D-Cache4 is
Once stored in the store buffer 8. The data read from the D-Cache 4 is written back to the general-purpose register in the W stage.

【００１３】Ｄ４ステージはバイパスのために設けら
れ、ロード命令の実行結果はＲＡＷハザードをもつ後続
命令のＲステージにバイパスされる。The D4 stage is provided for bypass, and the execution result of the load instruction is bypassed to the R stage of a subsequent instruction having a RAW hazard.

【００１４】D-Cache４を構成するSRAMは、プロセッサ
の動作周波数の向上に見合うほどの周波数向上は技術的
に困難であるため、動作周波数の高いプロセッサは、キ
ャッシュメモリアクセスに長いサイクル数をかけざるを
得ない。このため、ロード命令のレイテンシーが長くな
り、それに伴ってパイプライン中にバブルが大量に入り
込んで、性能向上の妨げになる。Since it is technically difficult to increase the frequency of the SRAM constituting the D-Cache 4 to the extent that the operating frequency of the processor can be improved, a processor with a high operating frequency has to apply a long number of cycles to cache memory access. Not get. For this reason, the latency of the load instruction increases, and a large number of bubbles enter the pipeline with the latency, which hinders performance improvement.

【００１５】図１２は図１１のロードストア命令実行ユ
ニットを内蔵するプロセッサのフロアプランの一例を示
すレイアウト図である。キャッシュメモリは物理的に大
きいので、チップ上の配置の自由がなく、整数演算ユニ
ットと距離が開きがちである。通常は、整数演算ユニッ
ト内に汎用レジスタとバイパスパスを配置するため、D-
Cacheとオペランド・バイパスパスとの距離が開いてし
まう。FIG. 12 is a layout diagram showing an example of a floor plan of a processor incorporating the load / store instruction execution unit of FIG. Since the cache memory is physically large, there is no freedom in arrangement on the chip, and the distance from the integer operation unit tends to be wide. Normally, since general registers and bypass paths are arranged in the integer operation unit, D-
The distance between the Cache and the operand bypass path increases.

【００１６】動作周波数の高いプロセッサでは、配線遅
延が処理性能に大きく影響するため、D-Cacheとオペラ
ンド・バイパスパスとの距離が離れていることは、タイ
ミング設計上大きな制約になる。In a processor with a high operating frequency, the wiring delay greatly affects the processing performance, so that the distance between the D-Cache and the operand bypass path is a great constraint in timing design.

【００１７】本発明は、このような点に鑑みてなされた
ものであり、その目的は、ロード／ストア命令のレイテ
ンシーを短縮可能なプロセッサを提供することにある。The present invention has been made in view of such a point, and an object of the present invention is to provide a processor which can reduce the latency of load / store instructions.

【００１８】[0018]

【課題を解決するための手段】上述した課題を解決する
ために、本発明は、外部メモリに格納すべきデータ、あ
るいは前記外部メモリから読み出すべきデータを一時的
に格納するキャッシュメモリと、前記キャッシュメモリ
に格納されるデータの少なくとも一部が格納され、前記
キャッシュメモリよりもメモリ容量が小さく、かつ前記
キャッシュメモリよりもアクセス速度の速いデータメモ
リと、格納されたデータに対応するアドレスを格納する
タグ部を介して前記キャッシュメモリに対してアクセス
を行い、タグ部を介さずに前記データメモリに対してア
クセスを行う命令制御部と、を備える。In order to solve the above-mentioned problems, the present invention provides a cache memory for temporarily storing data to be stored in an external memory or data to be read from the external memory; A data memory that stores at least a part of data stored in the memory, has a smaller memory capacity than the cache memory, and has a higher access speed than the cache memory; and a tag that stores an address corresponding to the stored data. And a command control unit that accesses the cache memory via a memory unit and accesses the data memory without using a tag unit.

【００１９】本発明では、キャッシュメモリよりもメモ
リ容量が小さくてアクセス速度の速いデータメモリを設
け、ロード／ストア命令を実行する際には、キャッシュ
メモリのヒット／ミス結果が得られる前に、データメモ
リへのアクセス結果を利用して投機実行を行うため、ロ
ード／ストア命令のレイテンシーを短縮でき、プロセッ
サの処理性能の向上が図れる。According to the present invention, a data memory having a smaller memory capacity and a higher access speed than a cache memory is provided, and when a load / store instruction is executed, the data is read before a hit / miss result of the cache memory is obtained. Since the speculative execution is performed using the access result to the memory, the latency of the load / store instruction can be reduced, and the processing performance of the processor can be improved.

【００２０】[0020]

【発明の実施の形態】以下、本発明に係るプロセッサに
ついて、図面を参照しながら具体的に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a processor according to the present invention will be specifically described with reference to the drawings.

【００２１】（第１の実施形態）図１は第１の実施形態
のプロセッサに内蔵されるロードストアユニット（命令
制御部）の概略構成を示すブロック図である。図１のプ
ロセッサは、Ｒステージ、Ａステージ、Ｄ１〜Ｄ４ステ
ージおよびＷステージの計７つのステージに分けてパイ
プライン処理を行う。図１では、各ステージでの処理内
容がわかるように、ロードストアユニット内の各部を、
関連のあるステージに対応づけて図示している。(First Embodiment) FIG. 1 is a block diagram showing a schematic configuration of a load store unit (instruction control unit) incorporated in a processor according to a first embodiment. The processor shown in FIG. 1 performs pipeline processing in seven stages, namely, an R stage, an A stage, D1 to D4 stages, and a W stage. In FIG. 1, each section in the load store unit is
It is shown in association with the relevant stage.

【００２２】図１のプロセッサは、汎用レジスタ１と、
仮想アドレスを演算する加算器２と、仮想アドレスを物
理アドレスに変換するＴＬＢ(Translation Lookaside B
uffer)３と、データキャッシュメモリ(D-Cache）４と、
D-Cache４に格納されたデータに対応するアドレスを格
納するデータタグメモリ（D-tag）５と、D-Cache４より
も容量が小さくて高速なデータメモリ（Ｌ０メモリ：レ
ベル０メモリ）６と、仮想アドレスを一時的に格納する
ストアバッファ７と、D-Cache４に格納すべきデータを
一時的に格納するストアバッファ８と、ＴＬＢ３の出力
とD-tag５の出力とを比較する比較器９と、D-Cache４か
ら読み出したデータとデータメモリ６から読み出したデ
ータとを比較する比較器１０とを備えている。The processor of FIG. 1 includes a general-purpose register 1,
An adder 2 for calculating a virtual address; and a TLB (Translation Lookaside B) for converting a virtual address to a physical address.
buffer) 3, a data cache memory (D-Cache) 4,
A data tag memory (D-tag) 5 for storing an address corresponding to the data stored in the D-Cache 4, a data memory (L0 memory: level 0 memory) 6 having a smaller capacity and a higher speed than the D-Cache 4, A store buffer 7 for temporarily storing a virtual address, a store buffer 8 for temporarily storing data to be stored in the D-Cache 4, a comparator 9 for comparing the output of the TLB 3 with the output of the D-tag 5, A comparator 10 is provided for comparing data read from the D-Cache 4 with data read from the data memory 6.

【００２３】汎用レジスタ１は、まだ確定していないテ
ンポラリ状態のデータを逐次上書き格納するGPROTF（Ge
neral Purpose Register On The Fly)１ａと、確定した
データのみを格納するGPRArch（General Purpose Regis
ter Architectural)１ｂとを有する。The general-purpose register 1 stores a GPROTF (Ge
neral Purpose Register On The Fly) 1a and GPRArch (General Purpose Regis
ter Architectural) 1b.

【００２４】加算器２は、Ａステージにて、汎用レジス
タ１から読み出したベースアドレスとオフセットアドレ
スとを加算して仮想アドレスを生成する。この仮想アド
レスはＴＬＢ３に供給される.アドレスの下位ビットは
インデックスとしてストアバッファ７に格納される。The adder 2 generates a virtual address by adding the base address read from the general-purpose register 1 and the offset address in the A stage. This virtual address is supplied to the TLB 3. The lower bits of the address are stored in the store buffer 7 as an index.

【００２５】ＴＬＢ３は、Ｄ１〜Ｄ３ステージの間に、
仮想アドレスを物理アドレスに変換する。比較器９は、
ＴＬＢ３から出力された物理アドレスとD-tag５から出
力されたD-Cache４内のデータに対応するアドレスとを
比較し、両者が一致すれば、キャッシュヒットを示す信
号を出力し、両者が一致しなければ、キャッシュミスを
示す信号を出力する。TLB3 is provided between stages D1 to D3.
Translate a virtual address to a physical address. The comparator 9 is
The physical address output from the TLB 3 is compared with the address corresponding to the data in the D-Cache 4 output from the D-tag 5, and if they match, a signal indicating a cache hit is output, and the two must match. For example, a signal indicating a cache miss is output.

【００２６】D-Cache４は、比較器９からキャッシュヒ
ット信号が出力されると、対応するデータを出力する。When a cache hit signal is output from the comparator 9, the D-Cache 4 outputs corresponding data.

【００２７】D-Cache４は、プロセッサに内蔵または外
付けされるキャッシュメモリの中では最高速のメモリで
あり、レベル１のキャッシュメモリとも呼ばれる。これ
に対して、データメモリ６は、D-Cache４よりも高速
で、かつD-Cache４よりもメモリ容量が小さい。また、
データメモリ６は、D-Cache４と異なり、タグ情報を格
納するタグ部を持たない。The D-Cache 4 is the fastest memory among cache memories built in or external to the processor, and is also called a level 1 cache memory. On the other hand, the data memory 6 is faster than the D-Cache 4 and has a smaller memory capacity than the D-Cache 4. Also,
Unlike the D-Cache 4, the data memory 6 does not have a tag section for storing tag information.

【００２８】ロード／ストア命令が発行されると、D-Ca
che４へのアクセスとともに、データメモリ６へのアク
セスが行われ、データメモリ６には、D-Cache４に格納
されるデータの一部が格納される。When a load / store instruction is issued, D-Ca
The access to the data memory 6 is performed together with the access to the che 4, and a part of the data stored in the D-Cache 4 is stored in the data memory 6.

【００２９】データメモリ６のヒット／ミス・チェック
は、データメモリ６のデータ出力とD-Cache４のデータ
出力とを比較することにより行われる。この比較は、図
１の比較器１０で行われる。比較器１０で一致が検出さ
れると、データメモリ６にヒットしたと判断される。The hit / miss check of the data memory 6 is performed by comparing the data output of the data memory 6 with the data output of the D-Cache 4. This comparison is performed by the comparator 10 of FIG. When the comparator 10 detects a match, it is determined that the data memory 6 has been hit.

【００３０】データメモリ６は、同じメモリ容量のD-Ca
che４と比較した場合に、D-Cache４よりもヒット率が高
いという特徴を持つ。その理由は、例えば、ロード命令
が発行されて、D-Cache４にアクセスしたときに、D-tag
５の中身が所望のものと異なるがD-Cache４の出力は所
望のデータと偶然一致していた場合、そのロード命令は
別ページへのアクセスと見なされてデータキャッシュミ
スとなる。しかし、データメモリ６はデータしか比較し
ないため、データメモリ６の出力が所望のデータと偶然
一致していた場合にもヒットと判断するためである。The data memory 6 has the same memory capacity as D-Ca
It has the feature that the hit rate is higher than D-Cache4 when compared with che4. The reason is, for example, when a load instruction is issued and the D-Cache 4 is accessed, the D-tag
If the contents of 5 are different from the desired data, but the output of D-Cache 4 coincides with the desired data by chance, the load instruction is regarded as an access to another page and a data cache miss occurs. However, since the data memory 6 compares only the data, the hit is determined even when the output of the data memory 6 coincides with the desired data by chance.

【００３１】比較回路１０は、D-Cache４からデータが
出力された後にデータメモリ６のヒットチェックを行う
ため、比較回路１０の比較結果を待って後続命令を実行
するようにすると、データメモリ６の高速性が生かされ
なくなる。Since the comparison circuit 10 performs a hit check of the data memory 6 after the data is output from the D-Cache 4, the comparison circuit 10 waits for the comparison result and executes the subsequent instruction. High speed cannot be used.

【００３２】そこで、本実施形態では、データメモリ６
のヒットチェックを待たずに、データメモリ６の出力を
ロード命令の実行結果として利用し、その後続の命令を
投機的に実行する。これにより、ロード命令のレイテン
シーを短縮でき、パイプライン中のバブルを減少できる
ことから、プロセッサの性能向上が図れる。Therefore, in the present embodiment, the data memory 6
Without waiting for the hit check, the output of the data memory 6 is used as the execution result of the load instruction, and the subsequent instruction is speculatively executed. As a result, the latency of the load instruction can be reduced and bubbles in the pipeline can be reduced, so that the performance of the processor can be improved.

【００３３】図１のロード／ストアユニットは、制御部
１１により制御される。制御部１１の内部には、パイプ
ライン制御部１２と、データメモリ６の使用を許可する
か否かを切替可能な制御レジスタ１３とが設けられてい
る。パイプライン制御部１２は、データメモリ６を使用
するか否かにより、後続する命令をパイプラインに投入
するタイミングを変える。The load / store unit shown in FIG. Inside the control unit 11, a pipeline control unit 12 and a control register 13 capable of switching whether to permit use of the data memory 6 are provided. The pipeline control unit 12 changes the timing at which a subsequent instruction is input to the pipeline depending on whether or not the data memory 6 is used.

【００３４】制御レジスタ１３の内容はプログラムによ
り設定可能である。制御レジスタ１３を設けることで、
データメモリ６を効率的に使用することができる。例え
ば、近接したアドレスを集中的にアクセスすることがわ
かっている場合は、投機実行が失敗する可能性が低いた
め、データメモリ６の使用を許可する。このように、実
行するプログラムの種類に応じてデータメモリ６を使用
するか否かを切り替えることにより、プロセッサの処理
性能を向上できる。The contents of the control register 13 can be set by a program. By providing the control register 13,
The data memory 6 can be used efficiently. For example, when it is known that a close address is intensively accessed, the use of the data memory 6 is permitted because the possibility of the speculative execution failure is low. As described above, by switching whether or not to use the data memory 6 according to the type of the program to be executed, the processing performance of the processor can be improved.

【００３５】次に、データメモリ６のリフィル動作を説
明する。ロード命令を実行しようとしたときに、データ
メモリ６がミスしてしまった場合、そのロード命令の実
行結果と後続命令を破棄する。これに並行してデータメ
モリ６のリフィルを行う。Next, the refill operation of the data memory 6 will be described. If the data memory 6 is missed when the load instruction is executed, the execution result of the load instruction and the succeeding instruction are discarded. At the same time, the data memory 6 is refilled.

【００３６】具体的には、データメモリ６がミスしたと
きにD-Cache４がヒットした場合には、D-Cache４のヒッ
トしたエントリをデータメモリ６の該当するインデック
スが指すエントリに書き込む（あるいは上書きする）。
また、データメモリ６がミスしたときにD-Cache４もミ
スした場合には、外部メモリからのデータをデータメモ
リ６とD-Cache４の双方にリフィルする。More specifically, if the D-Cache 4 hits when the data memory 6 misses, the hit entry of the D-Cache 4 is written (or overwritten) to the entry indicated by the corresponding index in the data memory 6. ).
If the D-Cache 4 also misses when the data memory 6 misses, the data from the external memory is refilled in both the data memory 6 and the D-Cache 4.

【００３７】なお、データメモリ６へのリフィル処理が
正常に行われたか否かを示す有効ビット(varid bit)を
データメモリ６の各エントリに設けてもよい。有効ビッ
トはすべて、リセット時にクリアされ、リフィル時に
「１」にセットされる。ロード命令がデータメモリ６の
あるエントリにアクセスしたとき、そのエントリの有効
ビットが「０」の場合には、まだデータメモリ６のリフ
ィル処理が終わっていないことから、ロード命令と依存
関係にある後続命令の実行開始を遅らせることができ
る。It should be noted that a valid bit (varid bit) indicating whether or not the data memory 6 has been successfully refilled may be provided in each entry of the data memory 6. All valid bits are cleared on reset and set to "1" on refill. When the load instruction accesses an entry in the data memory 6 and the valid bit of the entry is “0”, the refill processing of the data memory 6 has not been completed yet, and the subsequent Instruction execution can be delayed.

【００３８】このように、有効ビットを設けると、誤っ
た投機実行を回避でき、誤った投機実行からの回復に要
するサイクル数を費やさずにすむため、プロセッサの性
能向上が図れる。As described above, when the valid bit is provided, erroneous speculative execution can be avoided, and the number of cycles required for recovery from erroneous speculative execution is not required, so that the performance of the processor can be improved.

【００３９】次に、データメモリ６のストア動作を説明
する。ストア命令が発行されると、まず図１のストアバ
ッファ８にインデックスとデータが一時的に格納され
る。D-Cache４に対するデータ書き込みが可能になった
とき、すなわち、ストア命令とその前の命令で例外が起
こらないことがわかり、かつこのストア命令とその前の
命令の実行に用いた予測（分岐予測など）の結果がすべ
て正しいとわかったとき、ストアバッファ８内のデータ
はストアバッファ７内のインデックスの指すD-Cache４
に格納される。Next, the store operation of the data memory 6 will be described. When a store instruction is issued, first, an index and data are temporarily stored in the store buffer 8 of FIG. When data writing to the D-Cache 4 becomes possible, that is, it is known that an exception does not occur in the store instruction and the instruction before it, and the prediction (branch prediction etc.) used for executing the store instruction and the instruction before it ), The data in the store buffer 8 is stored in the D-Cache 4 indicated by the index in the store buffer 7.
Is stored in

【００４０】一方、データメモリ６は専用のストアバッ
ファを持っておらず、ストア命令が発行されると、スト
ア命令のアクセス先インデックスが指すデータメモリ６
のエントリに無条件で書き込む。その際、データメモリ
６が上述した有効ビットを持っている場合は、有効ビッ
トを「１」に設定する。On the other hand, the data memory 6 does not have a dedicated store buffer, and when a store instruction is issued, the data memory 6 indicated by the access destination index of the store instruction.
Unconditionally write to the entry. At this time, if the data memory 6 has the above-mentioned valid bit, the valid bit is set to “1”.

【００４１】このように、D-Cache４は、ストア命令を
実行する際にもヒット・ミスの判断を行い、ミスの場合
にはプロトコルによってリフィルが発生する。これに対
して、データメモリ６は、ストア命令実行時はヒット・
ミスの判断を行わず、リフィルなしにデータの書き込み
を行う。このため、ストア命令を実行すると、データメ
モリ６のヒット率を低下させてしまうおそれがある。な
お、ヒット率が低下するか否かはプログラムに依存す
る。As described above, the D-Cache 4 also determines a hit / miss when executing a store instruction, and in the case of a miss, refill occurs depending on the protocol. On the other hand, the data memory 6 has a hit
Data is written without refilling without determining a mistake. Therefore, when the store instruction is executed, the hit rate of the data memory 6 may be reduced. Whether or not the hit rate decreases depends on the program.

【００４２】そこで、ストア命令が発行されたときにデ
ータメモリ６の更新を行うモードと、ストア命令でデー
タメモリ６の更新を行わないモードとを実装しておき、
両モードを制御レジスタの設定で切り替えることができ
るようにするのが望ましい。Therefore, a mode in which the data memory 6 is updated when a store instruction is issued and a mode in which the data memory 6 is not updated by a store instruction are implemented.
It is desirable that both modes can be switched by setting the control register.

【００４３】図２はストア命令発行時にデータメモリ６
の更新を行うか否かを制御レジスタ１４により切り替え
る例を示すロードストアユニットのブロック図である。
制御レジスタ１４に設定された値に応じて、データメモ
リ制御部１６はストア命令実行時にデータメモリ６の更
新を行うか否かを切替制御する。制御レジスタ１４への
設定はプログラムにより行える。このように、ストア命
令実行時にデータメモリ６の更新を行うか否かを切替制
御することにより、データメモリ６のヒット率の低下を
抑制できる。なお、制御レジスタ１４の設定はプログラ
ムにより行うことができる。FIG. 2 shows the data memory 6 when a store instruction is issued.
FIG. 10 is a block diagram of a load store unit showing an example of switching whether or not to update the data by a control register 14.
In accordance with the value set in the control register 14, the data memory control unit 16 controls whether to update the data memory 6 when executing the store instruction. The setting in the control register 14 can be performed by a program. In this way, by controlling whether or not to update the data memory 6 when executing the store instruction, it is possible to suppress a decrease in the hit rate of the data memory 6. The setting of the control register 14 can be performed by a program.

【００４４】また、本実施形態のデータメモリ６にロッ
ク機構を追加してもよい。ここで、ロック機構とは、デ
ータメモリ６のエントリに上書き不可能の属性を追加で
きる機能である。データメモリ６のエントリのうちロッ
クされたものは、データメモリ６へのリフィル動作が起
こっても書き込みデータは無視されて、データメモリ６
の中身は変化しない。Further, a lock mechanism may be added to the data memory 6 of the present embodiment. Here, the lock mechanism is a function that can add an attribute that cannot be overwritten to an entry in the data memory 6. Of the entries in the data memory 6 which are locked, even if a refill operation to the data memory 6 occurs, the write data is ignored and the data memory 6
The contents of do not change.

【００４５】図３（ａ）はロック機構を持たないデータ
メモリのデータ構成図である。図３（ａ）の場合、デー
タ本体の他に、上述した有効ビットＢ１が設けられてい
る。一方、図３（ｂ）はロック機構を持つデータメモリ
のデータ構成図である。図３（ｂ）の場合、データ本体
の他に、有効ビットＢ１と、データメモリ６のエントリ
をロックするか否かを設定するロックビットＢ２とが設
けられている。FIG. 3A is a data configuration diagram of a data memory having no lock mechanism. In the case of FIG. 3A, the above-mentioned valid bit B1 is provided in addition to the data body. FIG. 3B is a data configuration diagram of a data memory having a lock mechanism. In the case of FIG. 3B, in addition to the data body, a valid bit B1 and a lock bit B2 for setting whether to lock an entry in the data memory 6 are provided.

【００４６】ロックビットＢ２が設定されている場合
は、データメモリ６の書き換えが禁止されるため、ロッ
クビットＢ２を設けることで、データメモリ６をロード
／ストア命令でアクセス可能なテンポラリメモリとして
使用できる。When the lock bit B2 is set, rewriting of the data memory 6 is prohibited. By providing the lock bit B2, the data memory 6 can be used as a temporary memory accessible by a load / store instruction. .

【００４７】なお、データメモリ６をテンポラリメモリ
として使用するための他の実装形態として、データメモ
リ６を読み書きするための専用の命令を設けてもよい。As another implementation for using the data memory 6 as a temporary memory, a dedicated instruction for reading and writing the data memory 6 may be provided.

【００４８】このように、第１の実施形態では、D-Cach
e４よりも高速でメモリ容量の小さいデータメモリ６をD
-Cache４のサブセットとして設け、ロード命令実行時に
はD-Cache４やデータメモリ６のヒット検出結果を待た
ずに、データメモリ６から読み出したデータを用いて後
続の命令を投機実行するため、ロード命令のレイテンシ
ーを短縮でき、プロセッサの性能向上が図れる。As described above, in the first embodiment, the D-Cach
Data memory 6 that is faster than e4 and has a smaller memory capacity
It is provided as a subset of -Cache4, and when executing a load instruction, the subsequent instruction is speculatively executed using the data read from the data memory 6 without waiting for a hit detection result of the D-Cache 4 or the data memory 6, so that the latency of the load instruction is And the performance of the processor can be improved.

【００４９】また、データメモリ６を使用するか否かを
制御レジスタ１３により切り替えできるようにしたた
め、データメモリ６を使用するか否かをプログラムによ
り指定でき、投機実行のミスを事前に回避できる。Further, since the use or non-use of the data memory 6 can be switched by the control register 13, it is possible to specify whether or not to use the data memory 6 by a program, and it is possible to avoid a mistake in speculative execution in advance.

【００５０】さらに、ストア命令実行時にデータメモリ
６の更新を行うか否かを制御レジスタ１４により切り替
えできるようにしたため、データメモリ６のヒット率低
下を抑制できる。Further, since whether or not to update the data memory 6 can be switched by the control register 14 when the store instruction is executed, the hit rate of the data memory 6 can be prevented from lowering.

【００５１】（第２の実施形態）第２の実施形態は、投
機実行の成功確率を上げるために、データメモリ用ヒッ
ト履歴テーブルを設けた点に特徴がある。(Second Embodiment) The second embodiment is characterized in that a hit history table for data memory is provided in order to increase the probability of successful speculation execution.

【００５２】図４は第２の実施形態のプロセッサに内蔵
されるロードストアユニットの概略構成を示すブロック
図である。図４のロードストアユニットは、図１の構成
に加えて、ロード／ストア命令発行時におけるデータメ
モリ６の過去のヒット／ミス情報を格納するデータメモ
リ用ヒット履歴テーブル（ＤＨＴ：Data Memory HitHis
tory Table、履歴情報格納部）２１を有する。このＤＨ
Ｔ２１には、過去に発行されたロードストア命令がデー
タメモリ６にヒットしたか否かを示す履歴情報が格納さ
れる。FIG. 4 is a block diagram showing a schematic configuration of a load store unit built in the processor of the second embodiment. The load store unit of FIG. 4 has a data memory hit history table (DHT: Data Memory HitHis) for storing past hit / miss information of the data memory 6 when a load / store instruction is issued, in addition to the configuration of FIG.
tory Table, history information storage unit) 21. This DH
In T21, history information indicating whether or not a load store instruction issued in the past hits the data memory 6 is stored.

【００５３】図４の例では、ＤＨＴ２１のエントリはデ
ータメモリ６のエントリに対応して定義される。ＤＨＴ
２１は、図５に示すように４つの状態ＳＴ０，ＳＴ１，
ＳＴ２，ＳＴ３を有するステートマシン２２であり、デ
ータメモリ６のヒット／ミス結果に応じて、状態ＳＴ０
〜ＳＴ３の間で遷移する。In the example of FIG. 4, the entries of the DHT 21 are defined corresponding to the entries of the data memory 6. DHT
21, four states ST0, ST1, and ST1 as shown in FIG.
A state machine 22 having ST2 and ST3, and a state ST0 according to a hit / miss result of the data memory 6.
To ST3.

【００５４】初期状態（リセット時）では、状態ＳＴ０
に設定され、データメモリ６がヒットするたびに状態は
一つずつ上がっていく。また、データメモリ６がミスす
ると、状態は一つずつ下がっていく。例えば、状態ＳＴ
１のときにデータメモリ６がヒットすると、状態ＳＴ２
に遷移し、この状態でデータメモリ６がミスすると、元
の状態ＳＴ１に戻る。In the initial state (at the time of reset), the state ST0
And the state goes up one by one each time the data memory 6 hits. If the data memory 6 misses, the state goes down one by one. For example, state ST
When the data memory 6 hits at the time of 1, the state ST2
When the data memory 6 makes a mistake in this state, the state returns to the original state ST1.

【００５５】図５の例において、状態ＳＴ３が最高の状
態でる。状態ＳＴ３は、直前に３回連続してデータメモ
リ６にヒットしたことを示しており、次もヒットする確
率が高い。そこで、本実施形態では、状態ＳＴ３にいる
ときのみ投機実行を行うようにする。In the example of FIG. 5, the state ST3 is the highest state. The state ST3 indicates that the data memory 6 has been hit three times immediately before and has a high probability of the next hit. Therefore, in the present embodiment, speculative execution is performed only in the state ST3.

【００５６】このように、第２の実施形態では、データ
メモリ６のヒット／ミス結果に応じて状態が遷移するＤ
ＨＴ２１を設け、過去のヒット／ミス結果に応じて命令
の投機実行を行うか否かを選択するようにしたため、命
令の投機実行の成功確率を上げることができ、プロセッ
サの性能向上が図れる。As described above, in the second embodiment, the state of the data memory 6 changes according to the hit / miss result.
Since the HT 21 is provided to select whether or not to execute speculative execution of an instruction according to a past hit / miss result, the success probability of speculative execution of an instruction can be increased, and the performance of the processor can be improved.

【００５７】なお、ステートマシンの構成は図５に図示
したものに限定されない。また、上述した実施形態で
は、最上位の状態にあるときのみ投機実行を行うように
したが、最上位以外の特定の状態のときにも投機実行を
行ってもよい。The configuration of the state machine is not limited to that shown in FIG. Further, in the above-described embodiment, speculative execution is performed only in the highest state, but speculative execution may be performed in a specific state other than the highest state.

【００５８】（第３の実施形態）上述したデータメモリ
６をテンポラリメモリとして利用してもよく、例えば、
データメモリ６にスクラッチパッドＲＡＭと同様の機能
を持たせてもよい。この場合、主記憶空間上にスクラッ
チパッドＲＡＭ領域を割り当て、この領域に対するロー
ド／ストア命令が発行されると、インデックスを用いて
データメモリ６にアクセスする。(Third Embodiment) The data memory 6 described above may be used as a temporary memory.
The data memory 6 may have the same function as the scratch pad RAM. In this case, a scratch pad RAM area is allocated on the main storage space, and when a load / store instruction for this area is issued, the data memory 6 is accessed using the index.

【００５９】図６はスクラッチパッドＲＡＭ機能をもつ
ロードストアユニットの概略構成を示すブロック図であ
る。図６の制御部１１は、データメモリ６の使用を許可
するか否かを設定する制御レジスタ１３と、ストア命令
発行時にデータメモリ６にデータを書き込むか否かを設
定する制御レジスタ１４と、データメモリ６にスクラッ
チパッドＲＡＭ機能を持たせるか否かを設定する制御レ
ジスタ１５とを有する。FIG. 6 is a block diagram showing a schematic configuration of a load store unit having a scratch pad RAM function. The control unit 11 shown in FIG. 6 includes a control register 13 for setting whether or not use of the data memory 6 is permitted, a control register 14 for setting whether to write data to the data memory 6 when a store instruction is issued, And a control register 15 for setting whether or not the memory 6 has the scratch pad RAM function.

【００６０】図６のデータメモリ６は、ＤＭＡ(Direct
Memory Access)によりアクセス可能な構成になってい
る。すなわち、データメモリ６をスクラッチパッドＲＡ
Ｍとして使用する場合には、パイプラインの動作とは無
関係に、外部メモリとの間でデータをやり取りすること
ができる。The data memory 6 shown in FIG.
Memory Access). That is, the data memory 6 is stored in the scratch pad RA.
When used as M, data can be exchanged with an external memory regardless of the operation of the pipeline.

【００６１】このように、データメモリ６にスクラッチ
パッドＲＡＭ機能を持たせることにより、大量のデータ
を高速に処理することができる。また、データメモリ６
を複数の領域に分けて、一部の領域のみスクラッチパッ
ドＲＡＭとして利用し、他の領域はD-Cache４と同様の
使い方をすれば、外部メモリへのアクセスとパイプライ
ン処理とを同時に実行でき、プロセッサの処理性能を向
上できる。By providing the data memory 6 with the scratch pad RAM function as described above, a large amount of data can be processed at high speed. The data memory 6
Is divided into a plurality of areas, and only a part of the area is used as a scratch pad RAM, and the other areas are used in the same manner as the D-Cache 4, so that access to the external memory and pipeline processing can be executed simultaneously, The processing performance of the processor can be improved.

【００６２】ところで、最近のD-Cache４はメモリ容量
が大きいことから、フロアプラン上、配置の自由度を制
限せざるを得ない。このため、ロード／ストア命令のデ
ータパスにおける配線遅延の増大を招き、プロセッサの
動作周波数向上の妨げになっている。By the way, since the recent D-Cache 4 has a large memory capacity, the degree of freedom in arrangement must be limited on the floor plan. This causes an increase in wiring delay in the data path of the load / store instruction, which hinders an improvement in the operating frequency of the processor.

【００６３】一方、上述した各実施形態で説明したデー
タメモリ６は、D-Cache４よりもメモリ容量が小さく、
実装面積もD-Cache４より物理的に小さいため、演算ユ
ニット内にデータメモリ６を配置することができる。On the other hand, the data memory 6 described in each of the above embodiments has a smaller memory capacity than the D-Cache 4,
Since the mounting area is physically smaller than the D-Cache 4, the data memory 6 can be arranged in the arithmetic unit.

【００６４】したがって、本実施形態のデータメモリ６
をプロセッサに実装する際には、演算ユニット内にデー
タメモリ６を配置するか、あるいはデータメモリ６を演
算ユニットの近くに配置できることから、配線遅延を小
さくでき、その分、動作周波数の向上が図れる。Therefore, the data memory 6 of the present embodiment
When the processor is mounted on the processor, the data memory 6 can be arranged in the arithmetic unit or the data memory 6 can be arranged near the arithmetic unit, so that the wiring delay can be reduced and the operating frequency can be improved accordingly. .

【００６５】図７は演算ユニットの内部にデータメモリ
６を配置したプロセッサのレイアウト図である。図７の
場合、データメモリ６とオペランド・バイパス・パスと
の距離が短くなるため、配線遅延を抑制でき、動作周波
数の向上が図れる。FIG. 7 is a layout diagram of a processor in which the data memory 6 is arranged inside the arithmetic unit. In the case of FIG. 7, since the distance between the data memory 6 and the operand bypass path is reduced, the wiring delay can be suppressed, and the operating frequency can be improved.

【００６６】図１等では、データメモリ６の全ビットと
D-Cache４とを比較回路１０で比較する例を説明した
が、命令によってメモリアクセスデータ幅が異なるた
め、比較回路１０で比較するビット幅を選択できるよう
にしてもよい。In FIG. 1 and the like, all bits of the data memory 6 are
The example in which the comparison with the D-Cache 4 is performed by the comparison circuit 10 has been described. However, since the memory access data width differs depending on the instruction, the comparison circuit 10 may be able to select the bit width to be compared.

【００６７】図８は一部のビットを比較対象から外すこ
とを指示するバイトマスク信号を比較回路（ビット選択
回路）１０に入力する例を示している。図８の場合、バ
イトマスク信号でマスクされたビット以外のビットにつ
いて、データメモリ６の出力とD-Cache４の出力とが比
較される。FIG. 8 shows an example in which a byte mask signal for instructing to exclude some bits from the comparison target is input to the comparison circuit (bit selection circuit) 10. In the case of FIG. 8, the output of the data memory 6 and the output of the D-Cache 4 are compared for bits other than the bits masked by the byte mask signal.

【００６８】図８のように構成することで、命令のメモ
リアクセスデータ幅に応じて、データメモリ６の出力と
キャッシュメモリの出力との比較ビット幅を可変制御で
き、命令の種類によらず、正しい比較が行える。With the configuration as shown in FIG. 8, the comparison bit width between the output of the data memory 6 and the output of the cache memory can be variably controlled according to the memory access data width of the instruction. You can make a correct comparison.

【００６９】[0069]

【発明の効果】以上詳細に説明したように、本発明によ
れば、キャッシュメモリよりもメモリ容量が小さく、か
つキャッシュメモリよりもアクセス速度の速いデータメ
モリ６を備えるため、ロード命令のレイテンシーを短縮
でき、パイプライン中のバブルを減少できることから、
プロセッサの処理性能の向上が図れる。As described above in detail, according to the present invention, since the data memory 6 has a smaller memory capacity than the cache memory and has a higher access speed than the cache memory, the latency of the load instruction is reduced. And reduce bubbles in the pipeline,
The processing performance of the processor can be improved.

【図面の簡単な説明】[Brief description of the drawings]

【図１】第１の実施形態のプロセッサに内蔵されるロー
ドストアユニットの概略構成を示すブロック図。FIG. 1 is an exemplary block diagram illustrating a schematic configuration of a load store unit incorporated in a processor according to a first embodiment;

【図２】ストア命令発行時にデータメモリ６の更新を行
うか否かを制御レジスタにより切り替える例を示すロー
ドストアユニットのブロック図。FIG. 2 is a block diagram of a load store unit showing an example in which whether to update a data memory 6 when a store instruction is issued is switched by a control register.

【図３】（ａ）はロック機構を持たないデータメモリの
データ構成図、（ｂ）はロック機構を持つデータメモリ
のデータ構成図。3A is a data configuration diagram of a data memory having no lock mechanism, and FIG. 3B is a data configuration diagram of a data memory having a lock mechanism.

【図４】第２の実施形態のプロセッサに内蔵されるロー
ドストアユニットの概略構成を示すブロック図。FIG. 4 is a block diagram illustrating a schematic configuration of a load store unit built in a processor according to a second embodiment;

【図５】ステートマシンの一例を示す図。FIG. 5 illustrates an example of a state machine.

【図６】スクラッチパッドＲＡＭ機能をもつロードスト
アユニットの概略構成を示すブロック図。FIG. 6 is a block diagram showing a schematic configuration of a load store unit having a scratch pad RAM function.

【図７】演算ユニットの内部にデータメモリを配置した
プロセッサのレイアウト図。FIG. 7 is a layout diagram of a processor in which a data memory is arranged inside an arithmetic unit.

【図８】一部のビットを比較対象から外すことを指示す
るバイトマスク信号を比較回路に入力する例を示す図。FIG. 8 is a diagram illustrating an example in which a byte mask signal indicating that some bits are excluded from comparison targets is input to a comparison circuit;

【図９】ロード命令のレイテンシーが「２」であるパイ
プラインの一例を示す図。FIG. 9 is a diagram illustrating an example of a pipeline in which the latency of a load instruction is “2”;

【図１０】ロード命令のレイテンシーが「３」であるパ
イプラインの一例を示す図。FIG. 10 is a diagram illustrating an example of a pipeline in which the latency of a load instruction is “3”;

【図１１】従来のプロセッサのロードストア命令実行ユ
ニットの概略構成を示すブロック図。FIG. 11 is a block diagram showing a schematic configuration of a load store instruction execution unit of a conventional processor.

【図１２】図１１のロードストア命令実行ユニットを内
蔵するプロセッサのフロアプランの一例を示すレイアウ
ト図。FIG. 12 is a layout diagram showing an example of a floor plan of a processor including the load / store instruction execution unit of FIG. 11;

【符号の説明】[Explanation of symbols]

１汎用レジスタ２加算器３ＴＬＢ４データキャッシュメモリ(D-Cache) ５データタグメモリ(D-tag) ６データメモリ（Ｌ０メモリ）７，８ストアバッファ９，１０比較器１１制御部１２パイプライン制御部１３，１４，１５制御レジスタ２１データメモリ用ヒット履歴テーブル（ＤＨＴ：Da
ta Memory Hit HistoryTable）２２ステートマシンDESCRIPTION OF SYMBOLS 1 General-purpose register 2 Adder 3 TLB 4 Data cache memory (D-Cache) 5 Data tag memory (D-tag) 6 Data memory (L0 memory) 7, 8 Store buffer 9, 10 Comparator 11 Control part 12 Pipeline control Part 13, 14, 15 Control register 21 Data memory hit history table (DHT: Da
ta Memory Hit HistoryTable) 22 State Machine

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 9/34 ３５０Ｇ０６Ｆ 9/34 ３５０Ａ 9/38 ３１０ 9/38 ３１０ＡＦターム(参考） 5B005 JJ11 KK12 MM01 MM21 NN12 NN66 QQ06 VV04 5B013 AA01 AA02 AA05 AA12 BB18 CC07 5B033 AA13 DB06 DB07 DB08 Of the front page Continued (51) Int.Cl. ⁷ identification mark FI theme Court Bu (Reference) G06F 9/34 350 G06F 9/34 350A 9/38 310 9/38 310A F -term (reference) 5B005 JJ11 KK12 MM01 MM21 NN12 NN66 QQ06 VV04 5B013 AA01 AA02 AA05 AA12 BB18 CC07 5B033 AA13 DB06 DB07 DB08

Claims

【特許請求の範囲】[Claims]

【請求項１】外部メモリに格納すべきデータ、あるいは
前記外部メモリから読み出すべきデータを一時的に格納
するキャッシュメモリと、前記キャッシュメモリに格納されるデータの少なくとも
一部が格納され、前記キャッシュメモリよりもメモリ容
量が小さく、かつ前記キャッシュメモリよりもアクセス
速度の速いデータメモリと、格納されたデータに対応するアドレスを格納するタグ部
を介して前記キャッシュメモリに対してアクセスを行
い、タグ部を介さずに前記データメモリに対してアクセ
スを行う命令制御部と、を備えることを特徴とするプロ
セッサ。A cache memory for temporarily storing data to be stored in an external memory or data to be read from the external memory; and a cache memory for storing at least a part of data stored in the cache memory. A data memory having a smaller memory capacity and an access speed higher than that of the cache memory; and accessing the cache memory via a tag unit for storing an address corresponding to stored data. A command control unit for accessing the data memory without intervention.

【請求項２】ロード命令の発行時に前記データメモリか
ら読み出したデータと前記キャッシュメモリから読み出
したデータとを比較する比較回路を備え、前記命令制御部は、前記比較回路による比較結果が一致
した場合には、前記データメモリにヒットしたと判断
し、前記比較回路による比較結果が不一致の場合には、
前記データメモリにミスしたと判断することを特徴とす
る請求項１に記載のプロセッサ。2. A comparison circuit for comparing data read from the data memory and data read from the cache memory when a load instruction is issued, wherein the instruction control unit determines that the comparison result by the comparison circuit matches. It is determined that a hit has occurred in the data memory, and if the comparison result by the comparison circuit does not match,
The processor according to claim 1, wherein it is determined that the data memory has missed.

【請求項３】前記比較回路で比較するビット幅を選択可
能なビット選択回路を備え、前記比較回路は、前記ビット選択回路で選択されたビッ
トのみについて比較を行うことを特徴とする請求項２に
記載のプロセッサ。3. A bit selection circuit capable of selecting a bit width to be compared by the comparison circuit, wherein the comparison circuit compares only the bits selected by the bit selection circuit. A processor according to claim 1.

【請求項４】前記命令制御部は、発行されたロード命令
に対応するデータが前記データメモリから読み出される
と、前記比較回路の比較結果が得られる前に、後続する
命令を実行することを特徴とする請求項２または３に記
載のプロセッサ。4. The data processing apparatus according to claim 1, wherein when the data corresponding to the issued load instruction is read from the data memory, the instruction control unit executes a subsequent instruction before a comparison result of the comparison circuit is obtained. 4. The processor according to claim 2, wherein

【請求項５】前記命令制御部は、ロード命令の発行時に
前記データメモリにミスした場合には、そのロード命令
とその後続命令の実行結果を破棄し、並行して前記デー
タメモリのリフィル処理を行うことを特徴とする請求項
１〜４のいずれかに記載のプロセッサ。5. The method according to claim 1, wherein the instruction control unit discards the execution result of the load instruction and the succeeding instruction when the data memory misses when the load instruction is issued, and executes the refill processing of the data memory in parallel. The processor according to claim 1, wherein the processing is performed.

【請求項６】前記命令制御部は、ストア命令が発行され
ると、前記データメモリにヒットしたか否かにかかわら
ず、このストア命令のアクセス先インデックスが指す前
記データメモリのエントリに、該当するデータを格納す
ることを特徴とする請求項１〜５のいずれかに記載のプ
ロセッサ。6. The instruction control unit, when a store instruction is issued, irrespective of whether the data memory is hit or not, the instruction control unit corresponds to the entry of the data memory indicated by the access destination index of the store instruction. The processor according to claim 1, wherein the processor stores data.

【請求項７】前記データメモリのヒット／ミス結果の履
歴情報を格納する履歴情報格納部を備え、前記命令制御部は、前記履歴情報格納部の履歴情報に基
づいて前記データメモリの使用を許可するか否かを判断
することを特徴とする請求項１〜６のいずれかに記載の
プロセッサ。7. A history information storage unit for storing history information of hit / miss results of the data memory, wherein the command control unit permits use of the data memory based on history information of the history information storage unit. 7. The processor according to claim 1, wherein it is determined whether or not to perform the processing.

【請求項８】前記命令制御部は、前記履歴情報格納部に
より、直前に複数回連続して前記データメモリにヒット
したことが検出された場合のみ、前記データメモリの使
用を許可することを特徴とする請求項７に記載のプロセ
ッサ。8. The command control unit permits the use of the data memory only when the history information storage unit detects that the data memory has been hit a plurality of times immediately before immediately. The processor according to claim 7, wherein

【請求項９】前記データメモリの使用を許可するか否か
を切替制御する第１の切替制御部を備えることを特徴と
する請求項１〜８のいずれかに記載のプロセッサ。9. The processor according to claim 1, further comprising a first switching control unit for switching whether to permit use of said data memory.

【請求項１０】前記キャッシュメモリおよび前記データ
メモリとともに、同一チップ内に実装される演算ユニッ
トを備え、前記データメモリは、前記キャッシュメモリよりも前記
演算ユニットに近い場所、あるいは前記演算ユニットの
内部に実装されることを特徴とする請求項１〜９のいず
れかに記載のプロセッサ。10. An operation unit mounted on the same chip together with the cache memory and the data memory, wherein the data memory is located closer to the operation unit than the cache memory or inside the operation unit. The processor according to claim 1, wherein the processor is mounted.

【請求項１１】前記データメモリを、外部メモリとの間
でデータをＤＭＡ(Direct Memory Access)転送可能なＲ
ＡＭとして使用するか否かを切り替える第２の切替制御
部を備えることを特徴とする請求項１〜１０のいずれか
に記載のプロセッサ。11. The data memory according to claim 1, wherein said data memory is an R (Direct Memory Access) capable of transferring data to and from an external memory.
The processor according to any one of claims 1 to 10, further comprising a second switching control unit that switches whether to use the AM.