JPH03288246A

JPH03288246A - Instruction cache memory

Info

Publication number: JPH03288246A
Application number: JP2089954A
Authority: JP
Inventors: Seiji Yamaguchi; 山口　聖司
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-04-04
Filing date: 1990-04-04
Publication date: 1991-12-18

Abstract

PURPOSE:To fast decide (schedule) the processing procedure of the instructions of a microprocessor that can process plural instructions at one time and in parallel with each other by adding the instruction storing bits to a cache memory to show the type of each instruction. CONSTITUTION:A tag part TA consisting of at least some addresses of an entry of an instruction cache memory is provided together with a memory part Ii which stores the instructions, and the instruction sorting bits Ci which show the types of the stored instructions. In such constitutions, it is possible to simultaneously read out the instruction sorting bits that can previously sort the types of processing to be carried out by the instructions which are read out when the supply of instructions is requested to the instruction cache memory from an execution unit of a microprocessor and an access is applied to the instruction cache memory. Then it is possible to immediately decide (schedule) a specific instruction to be supplied to a specific execution unit with use of the instruction sorting bit. Thus a fast machine cycle is attained.

Description

【発明の詳細な説明】産業上の利用分野本発明は大容量の命令キャッシュメモリを内蔵し　複数
個の命令を同時に並列処理するマイクロプロセッサに利
用できるものであム従来の技術マイクロプロセッサの性能を向上させるための−つの手
法として大容量のキャッシュメモリが内蔵される傾向に
あム　大容量のキャッシュメモリを内蔵スるとキャッシ
ュメモリのヒツト率を上げることができ、その結果とし
てマイクロプロセッサが主記憶をアクセスする回数を大
幅に減らすことができるので性能の向上が計れもマイクロプロセッサに内蔵されるキャッシュメモリには
データキャッシュメモリと命令キャッシュメモリの２つ
があム同一容量のキャッシュメモリの場合では一般的に命令キ
ャッシュのヒツト率の方がデータキャッシュのヒツト率
よりも高くなａ　これ（よ　データに比べて命令の方が
局所性が高いことを意味していも　マイクロプロセッサ
の性能を向上させるためのもう一つの手法として複数個
の命令を並列処理する傾向にあム　最近ではマイクロプ
ロセッサにレジスタ演算命令を実行する実行ユニットと
、メモリアクセス命令（例えばロード／ストア命令）を
実行する実行ユニットと、分岐などの制御命令を実行す
る実行ユニットと、浮動小数点演算を実行する実行ユニ
ットを有してそれらの実行ユニットが並列処理可能な構
成になるものが出現していも　このようなマイクロプロ
セッサをスーパースケイラ一方式（５ｕｐｅｒｓｃａｌ
ａｒ）と呼んでいも発明が解決しようとする課題上記スーパースケイラ一方式では命令の解読と各命令の
それぞれの処理を実行するためにどの命令をどの順番で
どの実行ユニットで実行させればよいかを判断するスケ
ジューリングを行なうためのマシンサイクルが必要であ
り、スケジューリング動作の高速化が難しく兎本発明は上記の問題・点に鑑みてなされたもので、マイ
クロプロセッサに内蔵される大容量の命令キャッシュメ
モリに各命令毎に命令の種類を示す命令分類ビットを付
加することにより、複数個の命令を同時に並列処理でき
るマイクロプロセッサの命令の処理手順の決定（スケジ
ューリング）を高速に行なうことができる命令キャッシ
ュメモリを提供することを目的とすム課題を解決するための手段本発明（友　上記問題点を解決するた数　命令キャッシ
ュメモリのひとつのエントリに少なくともアドレスの一
部分で構成されるタグ部と、命令を格納するメモリ部と
、格納されている各命令単位毎に命令の種類を示す命令
分類ビットとを具備する命令キャッシュメモリであも作用本発明（友　上記構成により、マイクロプロセッサの実
行ユニットから命令キャッシュメモリに命令の供給を要
求して命令キャッシュメモリをアクセスした時に読み出
された命令がどのような処理を行なう命令であるかをあ
らかじめ分類できる命令分類ビットを同時に読み出し　
直ちに命令分類ビットを用いてどの実行ユニットにどの
命令を供給すればよいかを判断する（スケジューリング
）ことができるのでマシンサイクルの高速化を実現する
ことができも実施例第１図は本発明の命令キャッシュメモリの１エントリの
構成図であも第１図において、ＴＡはタグ部分のアドレスＶはエント
リが有効であるか無効であるかを示すための有効ビット
、　■０はライン０に格納されている命令、ＣＯはライ
ンＯに格納されている命令が複数個ある実行ユニットの
うちのどの実行ユニットで実行される命令であるかを指
定するための命令分類ビット、　■１はライン１に格納
されている命令、Ｃ１はライン１に格納されている命令
が複数個ある実行ユニットのうちのどの実行ユニットで
実行される命令であるかを指定するための命令分類ビッ
ト、　工２はライン２に格納されている命令、Ｃ２はラ
イン２に格納されている命令が複数個ある実行ユニット
のうちのどの実行ユニットで実行される命令であるかを
指定するための命令分類ビット、　■３はライン３に格
納されている命令、Ｃ３はライン３に格納されている命
令が複数個ある実行ユニットのうちのどの実行ユニット
で実行される命令であるかを指定するための命令分類ビ
ットであａ　これらで命令キャッシュの１エントリを構
成していも　上記の場合、タグ部分はタグアドレスＴＡ
と有効ビットＶとで構成してい＆　　−４データ部分は
命令Ｉｉと命令分類ビットＣ１とで構成していも例えば　マイクロプロセッサに複数個ある実行ユニット
がレジスタ演算命令実行ユニット、分岐実行ユニット、
メモリアクセス実行ユニット、浮動小数点加減算実行ユ
ニット、浮動小数点乗除算ユニットの５個の実行ユニッ
トを持っている場合に　命令分類ビットｃｉ　　（ｉ＝
ｏ〜３）は命令工ｉ　　（ｉ＝０〜３）が書き込まれる
ときく　書き込まれる命令がレジスタ演算命令であれば
命令分類ビットをＣｉ＝’ｌ’にセットして書き込へ　
分岐命令であれば命令分類ビットをＣ１＝＝’２’にセ
ットして書き込へ　メモリアクセス命令であれば命令分
類ビットをＣｉ＝　　３’にセットして書き込へ　浮動
小数点加減算命令であれば命令分類ビットをｃ　ｉ＝＝
　ｊ　、ｉ　＃にセットして書き込へ　浮動小数点乗除
算命令であれば命令分類ビットをＣ１＝ｊ５ｊにセット
して書き込へ　命令セットアーキテクチャで定義されて
いない命令であれば命令分類ビットをＣｉ＝’Ｏ’にセ
ットして書き込む。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention is applicable to a microprocessor that has a built-in large capacity instruction cache memory and processes multiple instructions simultaneously in parallel, thus improving the performance of conventional microprocessors. One way to improve performance is to incorporate large-capacity cache memory.By incorporating large-capacity cache memory, it is possible to increase the cache memory hit rate, and as a result, the microprocessor can use main memory. Although the number of accesses can be greatly reduced, performance can be improved. However, the cache memory built into a microprocessor typically has two cache memories: a data cache memory and an instruction cache memory. The hit rate of the instruction cache is higher than the hit rate of the data cache. One technique is to process multiple instructions in parallel.Recently, microprocessors are equipped with an execution unit that executes register operation instructions, an execution unit that executes memory access instructions (such as load/store instructions), and an execution unit that executes branching instructions. Even if a microprocessor has emerged that has an execution unit that executes control instructions and an execution unit that executes floating-point operations, and these execution units can be processed in parallel, such a microprocessor can be used as a superscalar. Method (5uperscal
The problem that the invention aims to solve, also known as ar), is that in the above-mentioned superscalar system, which instructions should be executed in which order and by which execution unit in order to decode the instructions and execute the processing of each instruction. The present invention was made in view of the above problems and points, and it is difficult to speed up the scheduling operation. An instruction that can quickly determine the processing procedure (scheduling) for instructions in a microprocessor that can process multiple instructions simultaneously in parallel by adding an instruction classification bit to the cache memory that indicates the type of instruction for each instruction. SUMMARY OF THE INVENTION It is an object of the present invention to provide a cache memory in which one entry of an instruction cache memory includes a tag section consisting of at least a portion of an address; The present invention also works with an instruction cache memory that includes a memory section for storing instructions and an instruction classification bit indicating the type of instruction for each stored instruction unit. Simultaneously reads the instruction classification bit that can classify in advance what kind of processing the instruction read when the instruction cache memory is accessed by requesting the supply of instructions to the instruction cache memory.
Since it is possible to immediately use the instruction classification bit to determine (scheduling) which instruction should be supplied to which execution unit, it is possible to realize faster machine cycles. In Figure 1, which is a diagram showing the configuration of one entry in the instruction cache memory, TA is a tag part address V is a valid bit to indicate whether the entry is valid or invalid, and 0 is stored in line 0. CO is an instruction classification bit that specifies which execution unit out of multiple execution units executes the instruction stored in line O. ■1 is stored in line 1. C1 is an instruction classification bit for specifying in which execution unit among multiple execution units the instruction stored in line 1 is executed; C2 is an instruction stored in line 2. The stored instruction, C2, is an instruction classification bit for specifying which execution unit out of multiple execution units executes the instruction stored in line 2, and ■3 is line 3. C3 is an instruction classification bit for specifying in which execution unit among multiple execution units the instruction stored in line 3 is executed. Even if it constitutes one entry in the instruction cache, in the above case, the tag part is the tag address TA.
The &-4 data part is composed of an instruction Ii and an instruction classification bit C1. For example, if a microprocessor has multiple execution units such as a register operation instruction execution unit, a branch execution unit,
When the instruction classification bit ci (i=
o to 3) are written when instruction i (i = 0 to 3) is written. If the instruction to be written is a register operation instruction, set the instruction classification bit to Ci = 'l' and proceed to writing.
If it is a branch instruction, set the instruction classification bit to C1=='2' and proceed to writing If it is a memory access instruction, set the instruction classification bit to Ci=3' and proceed to writing If it is a floating point addition/subtraction instruction Set the instruction classification bit to c i==
j, i Set to # and proceed to write If it is a floating point multiplication/division instruction, set the instruction classification bit to C1=j5j and proceed to write If it is an instruction that is not defined in the instruction set architecture, set the instruction classification bit to Ci = 'O' and write.

１エントリを上記のような構成にすることにより、マイ
クロプロセッサの実行ユニットから命令キャッシュメモ
リに命令の供給を要求して命令キャッシュメモリをアク
セスすることにより命令の読み出し動作と同時に命令の
種類を検出することができるのでどの実行ユニットで実
行すべき命令であるかが指定されも　また　未定義命令
についても同時に検出できるので例外処理も高速に対応
できム　ｑこでは　実行ユニットが５個ある場合につい
て想定しているので命令分類ビットは３ビツトで構成さ
れることになム第１図では１エントリが４ラインで構成される場合を示
している力（ｌエントリのライン数が４ラインでない場
合にζよ　ライン数に応じて各命令毎に命令分類ビット
を定義すれば同様の構成が可能であも第２図は本発明の命令キャッシュメモリの搭載するマイ
クロプロセッサの主要部分のブロック構成図である。第
２図において、　１は命令キャッシュメモリ、　２は命
令レジス久　４は命令解読手段、６は命令セレク久　８
はスケジューリング論理手比　１０、１２、１４、１６
、１８は実行ユニットである。命令キャッシュメモリ１
は４個の命令（ＩＯ，Ｉｔ、　　Ｉ２．　　Ｉ３）を同
時に読み出して命令レジスタ２に転送す也　命令レジス
タ２の命令は命令解読手段４で演算のソース資源および
デスティネーション資源の解読を行ない使用するレジス
タアドレスＲＯ，Ｒ１，Ｒ２，Ｒ３を生成していも　レ
ジスタアドレスＲＯ，Ｒ１，Ｒ２，Ｒ３および命令分類
ビットＣｏ、　　ＣＩ、　　Ｃ２，Ｃ３はスケジューリ
ング論理手段８に人力されて、演算のソース資源および
デスティネーション資源の衝突（レジスタ干渉とも言う
）が発生していないかを検出して命令レジスタ２のどの
命令をどの実行ユニットに転送するばよいのかを選択す
るための制御信号Ｓｉを生成していも　制御Ｎ：’号Ｓ
ｉは命令セレクタ６に入力されて命令レジスタ２に格納
されている命令を選択的に実行ユニット１０゜１２、　
１４．　１６．　１８に転送していも　各実行ユニット
１０．　１２．　１４．　１６．　１８では命令セレク
タ６から転送されてきた命令に対して実行を開始すも例えば　命令レジスタ２に格納されている４個の命令が
レジスタ演算命令が２個と浮動小数点加減算命令と分岐
命令であるとすれば　レジスタ演算実行ユニットは１個
のレジスタ演算命令しか実行できないので最大３命令を
同時に実行できることになム　ここで（よ　供給される
命令数（命令レジスタ２が保持している命令数）に比べ
て実行ユニット数が多い場合について記述している力丈
　基本的にはスケジューリング論理手段８が命令レジス
タ２に格納されている命令の組み合わせから並列に実行
できる命令がどれであるかを判定しているので最大４命
令並列に実行可能であａ第３図は本発明の命令キャッシ
ュメモリの第１の実施例の具体的な構成を示すブロック
図であムここでＧＬ　　１命令を３２ビツトの固定長と
し　書き込み動作では４命令毎にキャッシュメモリに書
き込まれて、読み出し動作では４命令毎に読み出せる場
合について説明すも第３図において、　２０はキヤ・ソシュメモリの主要部
分、　２２はタグアドレスＴＡを格納しているメモリア
レイ、　２４は有効ビットＶを格納して０るメモリアレ
イ、　２６−ｉ（ｉ＝ｏ〜３）は命令Ｉｉを格納してい
るメモリアレイ、　２８−ｉ（ｉ＝０〜３）は命令分類
ビ・ソトＣｉを格納しているメモリアレイ、　３０は行
デコーダ、　３２、３４．３６、３８は書き込みのため
のドライスティトノくッファ、４０は書き込まれる命令
がどの実行ユニットで実行される命令であるかを検出し
て書き込まれる命令分類ビットを生成する論理手乳４２
はタグアドレスＴＡとアドレスＡの上位ビ・ソトを比較
する比較器　４４はタグアドレスＴＡとアドレスＡの上
位ビットとの比較結果と有効ビ・ントＶとの論理積をと
り命令キャッシュメモリのヒ・ント信号ＨＴを生成する
アンドゲート、４６、４８は命令キャッシュメモリのヒ
ツト時に読み出される命令ＲＩｉ（ｉ＝ｏ〜３）および
命令分類ビ・ントＲＣｉ　　（ｉ＝０〜３）を出力する
ためのトライスティトバッファであも第４図（ａ）および（ｂ）に第３図の書き込み動作およ
び読み出し動作の動作波形図を示す。第３図および第４
図（ａ）、　（ｂ）を用いて書き込み動作および読み出
し動作について説明すも　ただし　ここでは命令キャッ
シュメモリの主要部分２０にクロックＰＨ１，ＰＨ２が
入力されてこれらのクロックに同期して動作している場
合について説明すも最初に命令の書き込み動作（ＷＲＤがｊＨｊの場合）で
は　クロックＰＨ１に同期してアドレスＡが行デコーダ
３０に入力されて選択すべき行アドレスＮを確定してい
も　クロックＰＨ１の期間にはメモリアレイ２２．２４
．２６．２８をプリチャージ状態にしていも　クロック
ＰＨ２では選択された行アドレスＮに対応するワード線
Ｗ（Ｎ）が立ち上がりメモリセルをアクセスすも　書き
込まれる命令ＷＩはクロックＰＨ１に同期して転送され
て、クロックＰＨ２の時にワード線Ｗ　（Ｎ）により選
択されたメモリセルに書き込まれも　この時にクロック
ＰＨ１に同期して転送された命令がどの実行ユニットで
実行される命令であるかを論理手段４０で判定を行な（
＼　その判定結果ＷＣ１をクロックＰＨ２で命令分類ビ
ットＣｉとして書き込みを行なう。By configuring one entry as described above, the type of instruction can be detected at the same time as the instruction read operation by requesting the instruction supply from the execution unit of the microprocessor to the instruction cache memory and accessing the instruction cache memory. Since it is possible to specify which execution unit should execute an instruction, it is also possible to detect undefined instructions at the same time, so exception handling can be handled at high speed. Therefore, the instruction classification bit is composed of 3 bits. Figure 1 shows the case where one entry consists of 4 lines (If the number of lines in an entry is not 4, then Although a similar configuration is possible if instruction classification bits are defined for each instruction according to the number of lines, FIG. In Figure 2, 1 is an instruction cache memory, 2 is an instruction register register, 4 is an instruction decoder, and 6 is an instruction select register.
is the scheduling logic ratio 10, 12, 14, 16
, 18 are execution units. Instruction cache memory 1
reads four instructions (IO, It, I2, I3) at the same time and transfers them to the instruction register 2.The instruction in the instruction register 2 is used by the instruction decoding means 4 which decodes the source resource and destination resource of the operation. Even though the register addresses RO, R1, R2, R3 are generated, the register addresses RO, R1, R2, R3 and the instruction classification bits Co, CI, C2, C3 are manually inputted into the scheduling logic means 8 to determine the source resource and destination of the operation. Even if a control signal Si is generated to detect whether a conflict of nation resources (also called register interference) has occurred and select which instruction in instruction register 2 should be transferred to which execution unit, N: 'S
i is input to the instruction selector 6 and selectively executes the instruction stored in the instruction register 2 in the execution units 10 and 12;
14. 16. Each execution unit 10. 12. 14. 16. 18 starts execution of the instruction transferred from the instruction selector 6. For example, if the four instructions stored in the instruction register 2 are two register operation instructions, a floating point addition/subtraction instruction, and a branch instruction. Then, since the register operation execution unit can only execute one register operation instruction, it can execute a maximum of three instructions at the same time. Basically, the scheduling logic means 8 determines which instructions can be executed in parallel from the combination of instructions stored in the instruction register 2. Therefore, a maximum of four instructions can be executed in parallel.a Figure 3 is a block diagram showing the specific configuration of the first embodiment of the instruction cache memory of the present invention. We will explain the case where the write operation is written to the cache memory every 4 instructions, and the read operation can be read every 4 instructions.In Fig. 3, 20 is the main part of the cache memory, and 22 stores the tag address TA. 24 is a memory array that stores valid bits V and is set to 0; 26-i (i=o~3) is a memory array that stores instructions Ii; 28-i (i=0~3) ) is a memory array storing instruction classification BiSotoCi, 30 is a row decoder, 32, 34, 36, 38 are dry code buffers for writing, and 40 is an execution unit in which the instruction to be written is executed. Logic system 42 that detects whether the instruction is an instruction to be written and generates an instruction classification bit to be written.
44 is a comparator that compares the high-order bits of tag address TA and address A. 44 is a comparator that compares the high-order bits of tag address TA and address A with the valid bit V and calculates the logical product of the high-order bits of the instruction cache memory. The AND gates 46 and 48 which generate the input signal HT are trices for outputting the instruction RIi (i=o to 3) and the instruction classification bit RCi (i=0 to 3) read out when the instruction cache memory is hit. FIGS. 4(a) and 4(b) show operational waveform diagrams of the write operation and read operation of FIG. 3 in the case of the Tito buffer. Figures 3 and 4
The write operation and read operation will be explained using Figures (a) and (b). However, here, clocks PH1 and PH2 are input to the main part 20 of the instruction cache memory, and the main part 20 of the instruction cache memory operates in synchronization with these clocks. To explain the case, in the first instruction write operation (WRD is jHj), even if the address A is input to the row decoder 30 in synchronization with the clock PH1 and the row address N to be selected is determined, the period of the clock PH1 has memory array 22.24
．． Even if 26.28 is in the precharge state, the word line W(N) corresponding to the selected row address N rises at clock PH2 and the memory cell is accessed. However, the instruction WI to be written is transferred in synchronization with clock PH1. , the logic means 40 determines in which execution unit the instruction written in the memory cell selected by the word line W (N) at clock PH2 and transferred at this time in synchronization with clock PH1 is an instruction to be executed. Make a judgment (
\The determination result WC1 is written as the instruction classification bit Ci using the clock PH2.

次に命令の読み出し動作（ＷＲＤがｊＬｌの場合）につ
いて説明すも　クロックＰＨ１に同期してアドレスＡが
行デコーダ３０に入力されて選択すべき行アドレスを確
定していも　クロックＰＨ１の期間にはメモリアレイ２
２、２４、２６、２８をプリチャージ状態にしている。Next, we will explain the instruction read operation (when WRD is jLl). Even if address A is input to the row decoder 30 in synchronization with clock PH1 and the row address to be selected is determined, the memory is not used during the period of clock PH1. array 2
2, 24, 26, and 28 are in a precharged state.

クロックＰＨ２では選択された行アドレスＮに対応する
ワード線Ｗ　（Ｎ）が立ち上がりメモリセルをアクセス
す也　読み出されたタグアドレスＴＡはアドレスＡの上
位ビットと比較器４２で比較されも　比較器４２の出力
はタグアドレスＴＡとアドレスＡの上位ビットが一致し
ていればＰＨ１を出力し　不一致ならば′Ｌ′を出力す
ムアンドゲート４４では比較器４２の出力と有効ビットＶ
との論理積がとられてヒツト信号ＨＴが生成されも　ヒ
ツト信号ＨＴはタグアドレスＴＡとアドレスＡの上位ビ
ットが一致し　かス　選択されたエントリが有効ならば
ＩＨＩとなりトライスティトバッファ４６、４８をイネ
ーブル状態にして、命令レジスタ２およびスケジューリ
ング論理手段８に命令ＲＩｉと命令分類ビットＲＣｉを
転送する。At the clock PH2, the word line W (N) corresponding to the selected row address N rises and accesses the memory cell.The read tag address TA is compared with the upper bit of the address A by the comparator 42. If the upper bits of the tag address TA and address A match, it outputs PH1; if they do not match, it outputs 'L'.The output of the comparator 42 and the valid bit V are output at the gate 44.
If the selected entry is valid, it becomes IHI and the hit signal HT is generated if the upper bits of the tag address TA and address A match. When enabled, the instruction RIi and the instruction classification bit RCi are transferred to the instruction register 2 and the scheduling logic means 8.

一人　タグアドレスＴＡとアドレスＡの上位ビットが不
一致である力＼　また（よ　選択されたエントリが無効
ならばＩＬＩとなりトライスティトバッファ４６、４８
をディセーブル状態にして、命令レジスタ２およびスケ
ジューリング論理手段８に命令ＲＩｉと命令分類ビット
ＲＣｉを転送しないようにしていも　このとき、キャッ
シュはミスヒツト状態であるためエントリの置換をおこ
なわなければならな（ｔな耘　上記の説明ではタグアドレスＴＡとアドレスＡの
上位ビットが一致している場合（ＷＲＴがｊＬｊの場合
もしくはキャッシュメモリがヒット状態）について説明
した力（キャッシュがミスヒツト状態ではエントリの置
換が行なわれて後にキャッシュアクセスが実行されも　
エントリの置換には置換アルゴリズムによって選択され
たエントリにタグアドレスＴＡと有効ビットＶ　（Ｖ＝
Ｈ’）が書き込まれも　このとき、ＷＲＴが′Ｈ′とな
っていも　また　上記の実施例ではダイレクトマツプ方
式の場合について説明を行なったがセットアソシェイテ
ィブ方式の場合でも同様のことが実現できるのは言うま
でもなし一発明の効果以上の説明から明らかなようく　本発明によれば　スー
パースケイラ一方式のマイクロプロセッサに内蔵されて
いる命令キャッシュメモリが命令をメモリセルに書き込
むときく格納するとき）に書き込まれる命令がどの実行
ユニットで実行される命令かを判定して命令分類ビット
に格納すもマイクロプロセッサの実行ユニットから命令
キャッシュメモリに命令の供給を要求して命令キャッシ
ュメモリをアクセスし　ヒツトしたときに格納されてい
る命令と命令分類ビットの情報を命令レジスタおよびス
ケジューリング論理手段に転送して、命令分類ビットの
値に応じて各実行ユニットで命令を実行すム　スーパー
スケイラ一方式では命令をキャッシュメモリから読み出
してから命令の分類を行なわなければならない力又　本
発明では命令分類ビットを命令と同時に読み出すことが
できるので直ちに各実行ユニット毎に処理が実行できる
ようになも　これによってマイクロプロセッサ内部のク
リティカルパスの改善をはかることができ、マシンサイ
クルを向上させてマイクロプロセッサの高速動作および
性能向上が実現できもIf the high-order bits of tag address TA and address A do not match, the selected entry becomes ILI.
Even if the instruction RIi and instruction classification bit RCi are disabled and the instruction RIi and instruction classification bit RCi are not transferred to the instruction register 2 and the scheduling logic means 8, the entry must be replaced because the cache is in a miss state at this time ( In the above explanation, the force explained for the case where the upper bits of tag address TA and address A match (when WRT is jLj or the cache memory is in a hit state) (if the cache is in a miss state, entry replacement is not performed). Even if cache access is performed after
To replace an entry, the entry selected by the replacement algorithm is given a tag address TA and a valid bit V (V=
At this time, even if WRT becomes 'H', the above example describes the case of the direct map method, but the same thing can be achieved in the case of the set associative method. It goes without saying that, as is clear from the above explanation, according to the present invention, the instruction cache memory built in the superscalar one-way microprocessor stores instructions quickly when writing them into memory cells. It determines which execution unit the instruction written to the microprocessor is to be executed and stores it in the instruction classification bit.The microprocessor execution unit requests the supply of instructions to the instruction cache memory and accesses the instruction cache memory. The superscalar method transfers the stored instructions and instruction classification bit information to the instruction register and scheduling logic means to execute the instructions in each execution unit according to the value of the instruction classification bits. In addition, in the present invention, the instruction classification bit can be read at the same time as the instruction, so that processing can be executed immediately for each execution unit. It is possible to improve the critical path of the microprocessor, thereby improving the machine cycle and realizing faster operation and performance of the microprocessor.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の命令キャッシュメモリの１エントリの
構成艮　第２図は本発明の命令キャッシュを搭載したマ
イクロプロセッサの主要な部分のブロック医　第３図は
本発明の第１の実施例の命令キャッシュメモリの構成を
示すブロック医　第４図（ａ）は本発明の第１の実施例
の命令キャッシュメモリの書き込み動作を示す動作波形
図　第４図（ｂ）は本発明の第１の実施例の命令キャッ
シュメモリの読み出し動作を示す動作波形図であムト・・命令キャッジみ　２・・・命令レジス久　４・・
・命令解読手段、　６・・・命令セレク久　８・・・ス
ケジューリング論理平成　１０，１２．１４，１６．１
８・・・実行ユニット、　２０・・・命令キャッシュメ
モリの主要部分、　２２．２４，２６．２８・・・メモ
リアレイ、　３０・・・行デコーダ、　３２，３４．３
６．３８・・・書き込み用トライスティトバッファ、　
４０・・・命令分類ビットを生成するための論理手既　
４２・・・比較器　４４・・・アンドゲート、　４６．
４８・・・読み出し用トライスティトバッファ。FIG. 1 shows the configuration of one entry of the instruction cache memory of the present invention. FIG. 2 shows the block diagram of the main parts of a microprocessor equipped with the instruction cache of the present invention. FIG. 3 shows the configuration of one entry of the instruction cache memory of the present invention. 4(a) is an operation waveform diagram showing the write operation of the instruction cache memory according to the first embodiment of the present invention. FIG. 4(b) is a block diagram showing the configuration of the instruction cache memory according to the first embodiment of the present invention. This is an operation waveform diagram showing the read operation of the instruction cache memory in an example.
- Instruction decoding means, 6... Instruction selector 8... Scheduling logic Heisei 10, 12.14, 16.1
8... Execution unit, 20... Main part of instruction cache memory, 22.24, 26.28... Memory array, 30... Row decoder, 32, 34.3
6.38...Writing tryst buffer,
40...Logical procedure for generating instruction classification bits
42...Comparator 44...And gate, 46.
48... Tristite buffer for reading.

Claims

【特許請求の範囲】[Claims]

（１）命令キャッシュメモリのひとつのエントリに少な
くともアドレスの一部分で構成されるタグ部と、命令を
格納するメモリ部と、格納されている各命令単位毎に命
令の種類を示す命令分類ビットとを具備することを特徴
とする命令キャッシュメモリ。(1) One entry in the instruction cache memory includes a tag section consisting of at least a part of an address, a memory section for storing instructions, and an instruction classification bit indicating the type of instruction for each stored instruction unit. An instruction cache memory comprising:

（２）１命令のビット長が固定されている場合に、命令
キャッシュメモリのひとつのエントリに少なくともアド
レスの一部分で構成されるタグ部と、複数個の命令を格
納するメモリ部と、格納されている各命令単位毎に命令
の種類を示す命令分類ビットとを具備することを特徴と
する命令キャッシュメモリ。(2) When the bit length of one instruction is fixed, one entry in the instruction cache memory stores a tag section consisting of at least part of an address and a memory section that stores multiple instructions. An instruction cache memory comprising an instruction classification bit indicating the type of instruction for each instruction unit.

（３）１命令のビット長が固定されている場合に、命令
キャッシュメモリのひとつのエントリに少なくともアド
レスの一部分で構成されるタグ部と、複数個の命令を格
納するメモリ部と、格納されている各命令単位毎に命令
の種類を示す命令分類ビットとを具備し、メモリ部へ命
令を格納するときに前記命令分類ビットを生成すること
を特徴とする命令キャッシュメモリ。(3) When the bit length of one instruction is fixed, one entry in the instruction cache memory stores a tag section consisting of at least a part of an address and a memory section that stores multiple instructions. An instruction cache memory comprising an instruction classification bit indicating a type of instruction for each instruction unit, and generating the instruction classification bit when storing an instruction in a memory section.

（４）複数個の命令を同時に並列処理できるマイクロプ
ロセッサにおいて、命令キャッシュメモリのひとつのエ
ントリに少なくともアドレスの一部分で構成されるタグ
部と、命令を格納するメモリ部と、格納されている命令
のうちの同時に並列処理される命令毎に命令の種類を示
す命令分類ビットとを具備することを特徴とする命令キ
ャッシュメモリ。(4) In a microprocessor that can process multiple instructions simultaneously in parallel, one entry in the instruction cache memory includes a tag part consisting of at least a part of an address, a memory part for storing instructions, and a memory part for storing instructions. An instruction cache memory comprising an instruction classification bit indicating the type of instruction for each instruction that is simultaneously processed in parallel.