WO2015024532A9 - System and method for caching high-performance instruction - Google Patents

System and method for caching high-performance instruction Download PDF

Info

Publication number
WO2015024532A9
WO2015024532A9 PCT/CN2014/085063 CN2014085063W WO2015024532A9 WO 2015024532 A9 WO2015024532 A9 WO 2015024532A9 CN 2014085063 W CN2014085063 W CN 2014085063W WO 2015024532 A9 WO2015024532 A9 WO 2015024532A9
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
block
address
memory
branch
Prior art date
Application number
PCT/CN2014/085063
Other languages
French (fr)
Chinese (zh)
Other versions
WO2015024532A1 (en
Inventor
林正浩
Original Assignee
上海芯豪微电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯豪微电子有限公司 filed Critical 上海芯豪微电子有限公司
Priority to US14/913,837 priority Critical patent/US20160217079A1/en
Publication of WO2015024532A1 publication Critical patent/WO2015024532A1/en
Publication of WO2015024532A9 publication Critical patent/WO2015024532A9/en
Priority to US15/722,814 priority patent/US10275358B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code

Definitions

  • the invention relates to the field of computers, communications and integrated circuits.
  • the role of the cache is to copy part of the lower-level memory in it, so that the content can be quickly accessed by higher-level memory or processor core to ensure the continuous operation of the pipeline.
  • the addressing of the current cache is based on the following method: the index in the address tag is used to address the tag in the tag memory to match the tag segment in the address; the index segment in the address is used to address the read buffer together with the segment in the block.
  • the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is not the same as the tag segment in the address, it is called a cache miss, and the content read from the cache is invalid.
  • the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.
  • cache misses can be divided into three categories: forced misses, missing conflicts, and missing capacity. In the prior art, forced deletion is inevitable except for a small portion of prefetch success.
  • Modern cache systems are typically composed of multi-level caches connected by multiplexes.
  • New cache structures such as victim cache, trace cache, and prefetch, are based on the basic cache structure described above and improve the above structure.
  • the current architecture especially the lack of multiple caches, has become the most serious bottleneck restricting the performance of modern processors.
  • the method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
  • the present invention provides a high performance instruction cache method, characterized in that the processor core is connected to a first memory containing executable instructions and a second memory faster than the first memory; the method comprises: Examining an instruction that is being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information; and establishing a plurality of tracks according to the extracted instruction information; Encapsulating at least one or more instructions that may be executed by the processor core from the first memory to the second memory according to one or more tracks in the plurality of instruction tracks; The method further includes the second memory being constructed in a fully associative manner, the first memory being constructed in a group associated manner.
  • the track is in one-to-one correspondence with the instruction block in the second memory.
  • the target address is addressed by the primary block number to determine if the target instruction belongs to a certain instruction block of the second memory.
  • the secondary block number is written to the track table, and when the instruction in the first memory is filled into the second memory, it is changed to the primary block number.
  • the active table corresponds to the block position of the block number; at the same time, the flag bits of each block number in the active table are sequentially reset, thereby indicating with the set flag bit The block number currently referenced by the track so that it will not be replaced by the active table.
  • the present invention also provides a high performance instruction cache system, characterized in that the system comprises: a processor core, the processor core is configured to execute an instruction; a first memory, the first memory is configured to store an instruction required by the processor core; a second memory, the second memory is configured to store instructions required by the processor core, and the second memory is faster than the first memory; a scanner, configured to review an instruction being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information; a track table, the track table is configured to store a plurality of tracks established according to the extracted instruction information; the system further includes: the second memory is configured in a fully associative manner; The first memory is constructed by a group association.
  • the tracks in the track table are in one-to-one correspondence with the instruction blocks in the second memory.
  • each instruction block in the second memory corresponds to a first-level block number.
  • the flag position of the block number corresponding to the active table is set; at the same time, the flag bits of each block number in the active table are sequentially reset, thereby using the flag bit that has been set. Indicates the block number currently referenced by the track table so that it will not be replaced by the active table.
  • the previous instruction block or the subsequent instruction block of the sequential address corresponding to one instruction block in the first memory is already stored in the first memory
  • the previous instruction block corresponding to the instruction block or the latter one is stored in the active table.
  • the storage location information of the instruction block in the first memory is stored in the active table.
  • the instruction When the instruction is located in a previous instruction block or a subsequent instruction block of the current instruction block in the first memory, the instruction may be directly in the first position according to the location information of the previous instruction block or the subsequent instruction block stored in the active table. The instruction is found in memory.
  • Boundary judgment is performed on the branch target instruction address; according to the judgment result, the branch target instruction located at different positions is given an address of a different format.
  • the secondary block number of the previous or next instruction block of the instruction block in which the branch instruction is located is used as the branch The secondary block number of the target instruction, with the address offset portion corresponding to the first memory in the branch target instruction address as the offset of the branch target instruction.
  • the active table content corresponding to the instruction being filled from the first memory to the second memory is stored in the micro active table; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, And the first-level instruction block directly uses the first-level block number read from the micro-active table as the first-level block number of the branch target instruction when the corresponding first-level block number is valid in the micro-active table; It is found that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but when the level one instruction block is invalid in the corresponding first level block number in the micro active table, directly The block number is used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction block of the branch instruction, and the previous or next secondary instruction block is in the micro active table When the corresponding secondary block number is valid, the secondary block number read out from the micro active table is directly used as the secondary block
  • the plurality of secondary block numbers and the corresponding contents of the block numbers in the active table are stored in the micro active table; if the branch target instruction is found by the review, the branch target instruction address is first matched in the micro active table, if If the matching is successful, the first block number or the second block number read out from the micro active table is directly used as the first block number or the second block number of the branch target instruction; if the matching is unsuccessful, the branch target is further The instruction address is sent to the active table match.
  • the entries of the active table are in one-to-one correspondence with the instruction blocks in the first memory, each entry storing the block address of the corresponding instruction block in the first memory; and the previous one of the sequential addresses corresponding to one instruction block in the first memory
  • the active table further stores storage location information of the previous instruction block or the subsequent instruction block corresponding to the instruction block in the first memory.
  • Boundary judgment is performed on the branch target instruction address; according to the judgment result, the branch target instruction located at different positions is given an address of a different format.
  • the system includes a singular or a plurality of adders; the adder is configured to add a lower bit of a branch instruction itself in a portion other than the offset corresponding to the first memory to a corresponding bit in the branch transfer distance, and determine the branch Whether the target instruction is located in the previous or next instruction block of the instruction block sequential address where the branch instruction is located in the first memory; when the branch target instruction is located in the first instruction block or the next instruction block of the current instruction block in the first memory In the middle, the instruction may be directly found in the first memory according to the location information of the previous instruction block or the subsequent instruction block stored in the active table.
  • the system further includes a micro active table; the micro active table is configured to store active table content corresponding to an instruction being filled from the first memory to the second memory; when the scanner finds that the branch target instruction is located a different one-level instruction block in the same two-level instruction block of the branch instruction, and the first-level instruction block is directly read from the micro-active table when the corresponding first-level block number in the micro-active table is valid.
  • the micro active table is configured to store active table content corresponding to an instruction being filled from the first memory to the second memory; when the scanner finds that the branch target instruction is located a different one-level instruction block in the same two-level instruction block of the branch instruction, and the first-level instruction block is directly read from the micro-active table when the corresponding first-level block number in the micro-active table is valid.
  • the block number is used as the first block number of the branch target instruction; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but the level one instruction block is in the micro active table When the corresponding primary block number is invalid, the secondary block number of the branch instruction is directly used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction of the branch instruction When the block and the previous or next secondary instruction block are valid in the corresponding secondary block number in the micro active table, directly use the secondary block number read from the micro active table as the The secondary block number of the branch target instruction.
  • the system further includes a micro active table; the micro active table is configured to store a plurality of secondary block numbers and corresponding contents of the block numbers in the active table; when the scanner detects the branch target instruction, the branch target is firstly The instruction address is matched in the micro active table. If the matching is successful, the primary block number or the secondary block number read out from the micro active table is directly used as the primary block number or the secondary block of the branch target instruction. Number; if the match is unsuccessful, the branch target instruction address is sent to the active table match.
  • the system and method of the present invention can provide a basic solution for the cache structure used by digital systems. Unlike the conventional cache system, which only populates after the cache is missing, the system and method of the present invention fills the instruction cache before the processor executes an instruction, and can fully hide the forced miss.
  • the system and method of the present invention essentially adopts a fully associative structure for a level 1 cache, and the level 2 cache uses a group-connected structure, substantially achieving an effect similar to a fully associative structure, avoiding capacity. Missing, but also improve the speed of the processor. Since the system and method of the present invention requires fewer matching operations and a lower rate of misses, power consumption is also significantly lower than conventional cache systems. Other advantages and applications of the present invention will be apparent to those skilled in the art.
  • FIG. 1 is a schematic diagram of an instruction prefetch structure constructed by using a multi-path group in a secondary cache according to the present invention.
  • 3 is an embodiment of the relationship between the primary instruction block, the secondary instruction block, and the corresponding storage unit of the present invention.
  • FIG. 4 is a specific embodiment of the secondary cache according to the present invention in a two-way group format.
  • FIG. 5 is another specific embodiment of the secondary cache of the present invention in the form of a two-way group.
  • FIG. 6 is another specific embodiment of a scanner configuration in the second level cache structure of the present invention.
  • Figure 7 shows the memory and format used in the micro-track table organized in a fully associative manner.
  • Figure 8 is an embodiment of a fully associative micro-track table.
  • Figure 4 shows a preferred embodiment of the invention.
  • a cache system including a processor core is taken as an example, but the technical solution of the present invention can also be applied to include any suitable processor (Processor).
  • the processor may be a general purpose processor (CPU), a microcontroller (MCU), a digital signal processor ( DSP), image processor (GPU), system on chip (SOC), application specific integrated circuit (ASIC), etc.
  • FIG. 1 is an instruction prefetching structure diagram 100 in which a level 2 cache of the present invention is constructed in a multi-path group format.
  • the structure 100 includes an active list 104, a scanner 108, a track table 110, and a tracker ( Tracker 114, a Level 2 Instruction Cache (L2 Cache) 106, a Level 1 Instruction Cache (L1 Cache) 112, and a Processor Core 116 ( CPU Core ).
  • L2 Cache Level 2 Instruction Cache
  • L1 Cache Level 1 Instruction Cache
  • CPU Core Processor Core
  • Instruction Address refers to the memory address of the instruction in the main memory, that is, the instruction can be found in the main memory according to the address.
  • the virtual address is assumed that the virtual address is equal to the physical address, and the method of the present invention is also applicable for the case where address mapping is required.
  • a branch instruction (Branch Instrutrion) or a branch source (Branch Source) refers to any form of instruction that causes the processor core 116 to change the Execution Flow (eg, execute an instruction out of order).
  • Branch source address ( Branch Souce Address ) can be the instruction address of the branch instruction itself;
  • branch target ( Branch Target ) refers to the target instruction that the branch transfer caused by the branch instruction is redirected, branch target address ( Branch Target Address It can refer to the address that is transferred when the branch transfer of the branch instruction succeeds, that is, the instruction address of the branch target instruction;
  • the current instruction can refer to the instruction currently being executed or acquired by the processor core;
  • the current instruction block can refer to the current positive The instruction block of the instruction executed by the processor.
  • the first level instruction cache 112 is constructed in a fully associative form, and the level one instruction cache 112 Each memory row is referred to as a level one instruction block, and the level one instruction cache 112 stores at least one level one instruction block of a continuous instruction including the current instruction.
  • Level 1 instruction cache 112 The method includes a plurality of first-level instruction blocks, each of the first-level instruction blocks includes a plurality of instructions, and each of the first-level instruction blocks stored in the first-level instruction cache 112 has a first-order block number (BNX1), and the first-order block number BNX1 It is the line number in the level one instruction cache 112 of the level one instruction block.
  • the secondary instruction cache 106 consists of two identical memories 126 and 128 In the composition, each memory constitutes a road group, and each road group has the same number of rows, that is, a two-way group form.
  • Each storage line of memories 126 and 128 is referred to as a secondary instruction block, and each secondary instruction block has a secondary block number ( BNX2), which is determined by the row number in the secondary instruction cache of the secondary instruction block and the cache path group in which it is located, that is, the index of the instruction row address (index The bit plus the cache path group bit indicating the instruction.
  • Each level two instruction block contains a plurality of level one instruction blocks.
  • the secondary block number of the present invention refers to the location of the secondary command block in the secondary instruction cache 106.
  • the secondary instruction cache 106 and the primary instruction cache 112 may comprise any suitable storage device, such as: a register ( Register ) or register file, static memory (SRAM), dynamic memory (DRAM), flash memory (flash Memory ), hard disk, solid state disk and any suitable storage device or future new form of memory.
  • Secondary instruction cache 106 Can work as a cache for the system, or as a level 1 cache when other caches exist; and can be partitioned into a plurality of memory segments called memory blocks for storing processor cores 116 Data to access, such as instructions in an Instruction Block.
  • the active table 104 contains two tag arrays 118 and 120 and two storage arrays that store the primary block number BNX1 122 and 124. Since the secondary instruction cache 106 is formed in the form of a two-way group, the active table is also constructed in the form of a two-way group.
  • a tag array and storage array in active table 104 and a secondary instruction cache Corresponding to one of the path groups 106, that is, the tag array 118, the storage array 122, and the L2 cache group 126, the tag array 120, the storage array 124, and the L2 cache group 128 Correspondence.
  • the elements that make up storage arrays 122 and 124 are called entries, and each entry is used to store the primary block number BNX1 and the valid bit ( Valid bit ) to save the relationship between the level one instruction block and the level two instruction cache. Since each secondary instruction block contains a plurality of primary instruction blocks, storage arrays 122 and 124 in active table 104 Each row contains a plurality of entries in which the row number BNX1 of the primary instruction block in the secondary instruction block 112 in the secondary instruction cache 112 is stored.
  • the scanner 108 examines the level 1 instruction cache from the level 2 instruction cache 106.
  • the first level instruction block acquires instruction type information and determines whether the instruction is a branch instruction or a non-branch instruction. If it is determined that the instruction is a branch instruction, the target address of the branch instruction is calculated.
  • the calculation method includes adding a branch transfer distance to the current instruction address by an adder to obtain a target address of the branch instruction. Then, the calculated target address of the branch instruction is sent to the active table. Match in 104.
  • each row of the track table 110 and the level one instruction cache 112 Each row corresponds to each other and is pointed by the same row pointer.
  • Each row of the track table 110 includes a plurality of track points, each of which corresponds to a level one instruction cache 112.
  • One instruction in a row that is, the number of track points per line in the track table is consistent with the number of instructions per line in the level one instruction cache.
  • a track point is an entry in the track table, which may contain information of at least one instruction, such as instruction class information, branch target address, and the like.
  • the track table address of the track point itself is related to the command address of the instruction represented by the track point ( Correspond); and the branch instruction track point contains the address of the branch target, and the address is related to the branch target instruction address.
  • the first level instruction cache 112 A plurality of consecutive track points corresponding to one block of instructions formed by a series of consecutive instructions are referred to as one track.
  • the command block is associated with the corresponding track by the same first block number (BNX1 ) instructions.
  • the total number of track points in a track can be equal to the total number of entries in a row in track table 110.
  • the track table 110 can also have other organizational forms.
  • the processor core 116 fetches instructions from the level one instruction cache 112 as needed, it is assumed that the instruction is not stored in the level one instruction cache at this time.
  • the instruction is then padded from the low level memory to the second level block number determined by the replacement algorithm (e.g., LRU) in the second level instruction cache 106 according to the instruction address (PC).
  • the replacement algorithm e.g., LRU
  • the corresponding level one instruction block in the second level cache 106 is filled into the level one instruction cache 112 by a replacement algorithm (such as LRU). ) Determine the storage line that BNX1 points to.
  • the replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm ( Random ) and other existing algorithms.
  • FIFO first in first out algorithm
  • LRU least recently used algorithm
  • Random random replacement algorithm
  • the scanner 108 Examine the instruction type in the first-level instruction block, extract the branch information of the branch instruction, and calculate the branch instruction target address.
  • the calculation method includes adding a branch transfer distance to the current instruction address by an adder to obtain a target address of the branch instruction.
  • the term 'filling' Fill )' means moving instructions from lower level memory to higher level memory.
  • the branch target instruction address and the active table that can be reviewed and calculated by the scanner 108 The instruction row address match stored in the middle determines whether the branch target instruction has been stored in the secondary instruction memory 106. Medium. First, the index points of the branch target instruction address are used to read the two tags stored in the active table, and then the two tags are compared with the calculated tag bits of the target branch instruction address.
  • the entry corresponding to the instruction in the successfully matched road group is selected, if the primary block number stored in the entry is BNX1) is valid, indicating that the target branch instruction has been stored in the level one instruction cache 112, then the offset of the first level block number BNX1 and the calculated branch target address stored in the active table ( Offset) is written into the track table together, and the write position is in the track point of the track table corresponding to the branch source address; if the first block number stored in the entry (BNX1) Invalid, indicating that the target branch instruction is not stored in the level one instruction cache 112, but only in the level two instruction cache 106, then the second level block number corresponding to the instruction is BNX2 And the calculated block offset of the branch target address and the branch target address offset are written into the track table together, and the write position is in the track point of the track table corresponding to the branch source address; if neither tag matches Successful, indicating that the instruction line where the branch target information
  • the first address and the second address may be used to represent position information of the track point (instruction) in the track table; wherein the first address indicates the block number of the track point corresponding to the track point (pointing to a track in the track table and the level one instruction cache) Corresponding one level one instruction block), the second address indicates the relative position of the track point (ie corresponding instruction) in the track (storage block) (offset, Address Offset ).
  • a set of first address and second address corresponds to a track point in the track table, that is, a corresponding track point can be found from the track table according to a set of first address and second address.
  • the track of the branch target may be determined according to the first address included in the content stored in the entry in the track table, and a specific track point of the target track is determined according to the second address.
  • the track table becomes a table representing a branch instruction with the branch source address corresponding to the track entry address and the branch target address corresponding to the entry of the entry.
  • the relationship between the next track and the next track is established in a track, and an end track point is set after each track represents the track point of the last instruction, wherein the first address of the next track (instruction block) is executed in the order of storage.
  • the first level instruction cache A plurality of instruction blocks can be stored in 112.
  • the next execution of the instruction block is also taken into the instruction read buffer for the processor core 116. Read execution.
  • the instruction address of the next instruction block can be found by the instruction address of the current instruction block plus the address length of an instruction block.
  • the address is sent to the active list 104 as described above. Matching, the obtained instruction block is filled in the instruction block of the level one instruction cache 112 indicated by the replacement algorithm.
  • the instructions in the next instruction block newly stored in the level 1 instruction cache 112 are also scanned by the scanner 108. Scanning, extracting information fills the track indicated by the first block number BNX1 as previously described.
  • the replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), a random replacement algorithm ( Random algorithm and other existing algorithms.
  • the tracker 114 is mainly composed of a selector 130, a register 132, and an incrementer 134.
  • Tracker 114 The read pointer points to the track table 110 The first branch instruction track point in the track in which the current instruction is located after the current instruction; or the end track point pointing to the track without the branch track point after the current instruction on the track.
  • Tracker 114 The read pointer is composed of a first address pointer and a second address pointer, wherein the value of the first address pointer is a first-order block number of the first-level instruction block where the current instruction is located, that is, a row pointer; and the second address pointer points to the current instruction on the track.
  • the first branch commands the track point or the end track point.
  • the first block number is provided by the tracker 114 when the processor core 116 fetches instructions from the level one instruction cache 112 as needed.
  • BNX1 is used to address the Level 1 instruction block, the processor provides the offset to fetch the corresponding instruction, and provides the BRANCH signal and the TAKEN signal to the tracker 114.
  • BRANCH The signal indicates whether the instruction is a branch instruction and the TAKEN signal is used to control the output of the selector.
  • Tracker 114 Used to indicate the first branch instruction after the current instruction, or to point to the end track point of the track if there is no branch track point after the current instruction on the track, and provide the processor core 116 with the first block number of the current instruction. BNX1 .
  • the processor core 116 directly from the level one instruction cache 112. Remove the instruction.
  • the secondary block number BNX2 Find the active table as the active table address. If the primary block number stored in the entry corresponding to the secondary block number is BNX1 If it is already valid, it indicates that the target address of another branch instruction before executing the instruction is the same as the instruction address corresponding to the second block number, and the target instruction has been taken into the first level instruction cache 112.
  • the first block number BNX1 is written into the track point, and when the instruction is executed, the processor core 116 goes directly to the first level instruction cache 112.
  • the instruction is fetched; if the primary block number BNX1 stored in the entry corresponding to the secondary block number is invalid, indicating that the target instruction is not in the primary instruction cache 112, then the primary block number is determined according to the replacement policy.
  • BNX1 the target instruction line is taken out from the second level instruction cache 106, filled into the first level instruction block corresponding to the level one instruction cache 112, and the first level block number BNX1 is written into the active table 104.
  • the processor core 116 directly fetches the instruction into the first-level instruction cache 112.
  • the tracker 114 If the branch instruction pointed to by the tracker 114 does not have a branch transfer, the tracker 114 The read pointer points to the first branch instruction track point after the branch instruction, or points to the end track point of the track if there is no branch instruction track point in the track point after the branch instruction.
  • the processor core reads the sequential instruction execution after the branch instruction.
  • the above slave instruction memory 106 is used.
  • the read branch target instruction block is stored in the instruction block specified by the buffer replacement logic in the instruction read buffer 112, and the scanner is filled in the corresponding track of the track table 110.
  • New track information that has been generated.
  • the branch target first address and the second address become new tracker address pointers, and point to track points corresponding to the branch targets in the track table.
  • the new tracker address pointer also points to the newly filled branch instruction block, making it the new current instruction block.
  • the processor core uses the instruction address from the new current instruction block ( The offset bit of PC) selects the required instruction.
  • the mobile read pointer points to the first branch instruction track point after the branch target instruction in the corresponding track of the new current instruction block, or points to the end of the track if there is no branch instruction track point in the track point after the branch target instruction Track point.
  • the tracker 114 points to the end track point in the track, the tracker 114 The read pointer is updated to the position content value in the end track point, that is, the first track point pointing to the next track, thereby pointing to the new current instruction block. After the tracker 114 The move read pointer points to the first branch instruction track point in the corresponding track of the new current block, or to the end track point of the track if the track has no branch command track point. Repeat the above process in turn, in the processor core 116 The instruction is populated into the instruction read buffer 112 before the instruction is executed, so that the processor core 116 does not need to wait while fetching the instruction, thereby improving processor performance.
  • FIG. 2 is an embodiment of the tracker read pointer movement of the present invention.
  • the tracker read pointer moves over the non-branch instruction in the track table, moves to the next branch point in the track table, and waits for the processor core 116 branch to determine the result.
  • Figure 2 Some components that are not related to the description of the present embodiment are omitted.
  • the track table 110 The instruction types stored in the instruction and the instruction information stored therein are arranged from small to large from left to right according to the instruction address, that is, when the instructions are executed in order, the access order of each instruction information and the corresponding instruction type is from left to right.
  • the instruction type of '0' in 110 means that the corresponding instruction in the track table 110 is a non-branch instruction, and the instruction type '1' indicates that the corresponding instruction is a branch instruction.
  • the track table can be read at any one time. 110 The entry representing the instruction type indicated by the second address 216 (offset, BNY) in a track indicated by the first address 214 (primary block number BNX1). The track table can also be read at any time. A plurality of entries representing all types of instructions, or even all entries, in a track indicated by the first address 214 in 110.
  • An end table entry is added to the right of the entry of the instruction with the largest instruction address in each row to store the address of the next instruction in sequence.
  • the instruction type of the end entry is always set to ' 1 '.
  • the first address of the instruction information in the end entry is the instruction block number of the next instruction, and the second address (BNY) ) Constant to zero, pointing to the first item of the instruction track.
  • the end table entry is defined to be equivalent to an unconditional branch instruction.
  • the tracker 114 mainly includes a shifter 202, a leading zero register 204, and an adder 206. , selector 208 and register 210.
  • the shifter 202 shifts the plurality of instruction types 218 representing the plurality of instructions read from the track table 110 to the left, and the number of bits of movement is changed by the register.
  • the second address pointer of the 210 output is determined by 216.
  • the leftmost bit of the shifted instruction type 224 output by the shifter 202 is the step bit (STEP Bit) ).
  • the signal of the step bit and the BRANCH signal from the processor core together determine the update of the register 210.
  • the selector 208 is controlled by the control signal TAKEN, and its output 232 It is the Next Address, which contains the first address part and the second address part.
  • TAKEN When TAKEN is '1' (branch is successful), selector 208 selects track table 110 The output 230 (containing the first address and the second address of the branch target) is used as the output 232.
  • TAKEN When TAKEN is '0' (the branch is unsuccessful), the selector 208 selects the current first address 214. As the first address portion of the output 232, the adder output 228 acts as the output 232 second address portion.
  • Instruction type 224 is sent to leading zero counter 204 to calculate the next '1' '
  • the type of instruction (representing the corresponding instruction is a branch instruction) is the number of '0' instruction types (representing the corresponding instruction is a non-branch instruction), which is calculated as a bit regardless of whether the step bit is '0' or '1' 0 '.
  • the resulting number of leading '0's 226 (step number STEP Number) is sent to the second address of the output of the adder 206 and the register 210. Add to the next branch address ( Next Branch Address ) 228 .
  • the next branch source address is the second address of the next branch instruction of the current instruction, and the previous non-branch instruction is skipped by the tracker 114 (Skip).
  • the shifter controlled by the second address also places the track table 110.
  • the output of the multiple instruction types is uniformly shifted to the left.
  • the instruction type representing the instruction read by the track table 110 is shifted to the leftmost step bit in the instruction type 224.
  • Shift instruction type 224 The leading zero counter is sent to calculate the number of instructions before the next branch instruction.
  • the output 226 of the leading zero counter 204 is the step size that the tracker should advance. This step and the second address 216 are added by the adder After adding 206, the next branch instruction address 228 is obtained.
  • step bit signal in the instruction type 224 after shifting is '0'
  • the entry in 110 is a non-branch instruction, at which point the step bit signal control register 210 is updated, and the selector 208 selects the next branch source address under the control of the TAKEN signal 222 of '0'. 228 becomes the second address 216, the first address 214 constant.
  • New second address control shifter 216 will be instruction type 218 Shift, so that the instruction type bit representing the branch instruction falls to the step bit of 224 for the next operation.
  • step bit signal in the instruction type 224 after shifting is '1', this indicates the track table to which the second address points.
  • the entries in the table represent branch instructions.
  • the step bit signal does not affect the register 210 update, and the register 210 is controlled to be updated by the BRANCH signal 234 from the processor core.
  • Adder output 228 at this time Is the address of the next branch instruction on the same track of the current branch instruction, and the memory output 230 is the target address of the current branch instruction.
  • the output 232 of the selector 208 updates the register 210. . If the TAKEN signal 222 from the processor core is '0' at this time, the processor core decides to select the sequential execution at this branch point, and the selector 208 selects the next branch source address 228. . At this time, the first address 214 of the register 210 is unchanged, and the next branch source address 228 becomes the new second address. . At this point the new first and second addresses point to the next branch instruction in the same track. The new second address control shifter 216 shifts the instruction type 218 so that the instruction type bits representing the branch instruction fall to 224 The stepping position is for the next step.
  • the selector selects the branch target address 230 read from the track table 110 to become the first address output by the register 210. 214 and the second address in the future 226.
  • BRANCH signal 234 control register 210
  • the first and second addresses are latched to become the new first and second addresses.
  • the new first and second addresses point to branch target addresses that may not be on the same track.
  • New second address control shifter 216 will be instruction type 218 Shift, so that the instruction type bit representing the branch instruction falls to the step bit of 224 for the next operation.
  • the internal control signal controls the selector 208 to select the track table as described above.
  • the output of 110 is 230 and the register 210 is updated.
  • the new first address 214 is the track table 110.
  • the first address of the next track recorded in the end entry, the second address is zero.
  • the second address control shifter 216 shifts the instruction type 218 to the next bit and starts the next operation. So repeating, tracker 114, in conjunction with track table 110, skips non-branch instructions in the track table and always points to branch instructions.
  • FIG. 3 is an embodiment 300 of a level one instruction block, a level two instruction block, and an addressing relationship of the present invention.
  • the instruction address The length of 301 is 40 bits, that is, the highest bit is the 39th bit, the lowest bit is the 0th bit, and each instruction address corresponds to one byte (Byte). Therefore, the lowest two bits of the instruction address 301 302 (ie, bits 1 and 0) corresponds to 4 bytes in an instruction word ( Instruction Word ). It is assumed that in the present embodiment, the command line 301 is high 8
  • the bit is the process identification bit (PID) 310 which indicates which process is currently executing.
  • PID process identification bit
  • Pass process identification bit 310 It can be determined whether the currently executing process is stored in the instruction cache, and if not, prefetching can be performed through the entire row address 301, thereby avoiding the absence of the instruction in the instruction cache.
  • the process identifier bit 310 can also be omitted, and the length of the instruction address is 32 bits. For ease of explanation, the lower two bits 302 and the highest eight bits 310 of the instruction address are removed below, with the remaining The 30 bits (i.e., bits 31 to 2) constitute a new command line address 312 for explanation.
  • a level one instruction block contains 16 instructions, so the offset in the instruction line address 312 ( Offset ) 303 has 4 bits, which can be used to determine the position of an instruction in the level one block.
  • the offset 303 corresponds to the second address (BNY) described in FIG. Therefore, it is also possible to use this offset to determine which track point in the track table corresponding to the instruction.
  • the track table has 512 rows, then the first block number BNX1 has 9 Bit, whose value is determined by the line number in which it is located. Therefore, if a level one instruction block is filled from the level two instruction cache 106 to the level one instruction instruction cache 112 according to the requirements of the processor 116.
  • the branch target instruction for determining the branch instruction according to the foregoing method is already stored in the first level instruction cache 112, and then the corresponding first level block number stored in the active table 104 is added with an offset 303.
  • the tracks are written together in the track point of the track table corresponding to the branch source instruction, and when the processor core 116 executes the branch instruction, the instruction can be directly read from the first level instruction cache 112.
  • the tag bit 311 in the command line address 312 is stored in a tag array in a path group of the active table 104. In 118 or 120, it is used to compare with the target instruction address generated by the scanner 108 to obtain matching information. It is assumed that in the present embodiment, active table 104 and secondary instruction cache blocks 126 and 128 There are 1024 lines, then the corresponding index line 312 of the instruction line address 307 has 10 bits (ie, the 17th to the 8th bits). Index bit 307 It is used to retrieve which row of the secondary instruction cache is located in the secondary instruction cache, and can also be used to store the tags stored in the corresponding tag array 118 and tag array 120 in each path group of the active table 104 in the active table.
  • Each path group of 104 is read out corresponding to the valid value in the entry.
  • the block offset ( Block-offset) 306 has two bits, the sixth and seventh bits. Block offset 306 is used to select the store in the secondary cache 106
  • the first-level instruction block in the secondary instruction block in the middle is used to select which of the entries in the active table corresponds to a valid value. Therefore, the path group number of the secondary instruction cache 106 where the secondary instruction block is located plus the instruction line address 312
  • the index bit 307 constitutes a secondary block number BNX2.
  • a level one instruction block is filled from the level two instruction cache 106 to the level one instruction instruction cache 112 according to processor requirements
  • the branch target instruction that determines the branch instruction according to the foregoing method is not stored in the first level instruction cache 112 but is stored in the second level instruction cache 106, then the corresponding second level block number BNX2
  • the block offset 306 and the offset 303 of the branch target address of the branch instruction are added. Write the track table together with the branch source instruction in the track point of the track table, and wait until the tracker pointer points to the track point, and fill the corresponding level one instruction block from the second level instruction cache 106 into the level one instruction cache.
  • the first-level block number determined by the replacement policy (such as LRU) is in the first-level cache block pointed to by BNX1, and when the processor core 116 executes the branch instruction, it can directly cache from the first-level instruction 112. Read the instruction directly in .
  • the mapping relationship between the instruction in the level one instruction cache and the level two instruction cache can be established.
  • the offset 303 of the upstream address 312 is added to determine the location of the instruction in the primary instruction block stored in the primary instruction cache 112; and the block offset 306 in the instruction row address 312 is used. It is possible to determine the position of the primary instruction block in the secondary instruction block stored in the secondary instruction cache 106; the index bit 307 in the instruction line address 312 plus the cache path group number in which the secondary instruction block is located (ie, secondary Block number BNX2) can determine the location of the secondary instruction block in the secondary instruction cache 106.
  • the primary block number BNX1 and the secondary block number BNX2 do not have a necessary mapping relationship
  • the primary block number BNX1 is replaced by a level one instruction block from the second level instruction cache 106 into the level one instruction cache 112 by a replacement algorithm (such as LRU)
  • the algorithm determines, and indicates that the second address (BNY) of the location of the instruction in the level one instruction cache and the level two instruction cache is the same, that is, the offset of the instruction line address 312. .
  • the mapping relationship between the instruction in the level one instruction cache and the level two instruction cache can be established.
  • the target instruction address calculated by the scanner 108 can be matched with the instruction address stored in the active table, thereby obtaining matching information with the instruction address, and then the secondary block number BNX2 Or the first block number BNX1 is written to the track table to generate a new track.
  • Target instruction address 312 is described using a portion of the complete instruction address.
  • Target instruction address 312 includes tag bit 311, index bit 307, block offset 306, and offset 303.
  • Tag bits 311 are used with tags 302 and 304 in active table 104 For comparison, matching information is obtained; index bit 307 is used to retrieve which row in the active table corresponds to the address; block offset 306 is used to select the corresponding one-level instruction block in the secondary instruction block; offset 303 Used to determine the position of the target instruction in the level one command line, that is, to provide the second address (BNY).
  • the secondary instruction cache 106 is composed of two blocks 126 and 128.
  • the two blocks contain the same number of rows, that is, in the form of a two-way group.
  • the active list is also constructed in the form of a two-way set.
  • Active table 104 consists of first part tag arrays 118 and 120 And a second part of the storage blocks 408 and 410. The first part of the tag arrays 118 and 120 are used to match the branch target address calculated by the scanner 108, and the second part is used to store the level 1 block number. BNX1.
  • each row of each path group in the active table 104 corresponds to four entries 408 or 410.
  • the track table has the same number of rows as the active table, which is 1024 lines.
  • Each row of the level one instruction cache 112 contains 16 instructions, that is, the level one instruction block contains 16 instructions, so the track table 110 There are 16 entries in each row.
  • the primary instruction block fetched from the secondary instruction cache 106 is padded to the primary instruction cache according to the LRU replacement policy.
  • the level one instruction block contains three branch instructions located in clauses 4, 7, and 11 of the level one instruction block.
  • the value is assumed to be ' 1654 'The value stored in the 14th row of the way group 0 of the active list 104, the value '2526' is stored in the 14th of the way group 1 of the active list 104 In the label of the line.
  • the valid bit of the entry 2 of the row 14 of the active set in the active list is '1'
  • the valid bit of the entry 3 is '0'
  • the entry corresponding to the 14th row of the way group 1 The valid bit of 2 is ' 0 '.
  • the target instruction address of the first branch instruction is calculated as ' 1654
  • the index bit 307 will be stored in the active table. The two valid tags in the row are read, and the read tags are sent to the comparator 420 and the comparator 422, respectively, and the tag bit 311 of the branch target instruction address 312 calculated by the scanner 108.
  • the road group '0' matches successfully. Then, using the block offset bit 306 of the branch target address 312, the corresponding entry 2 in the active table is selected, and the valid bit is '1. ', then the value '5' stored in it is written to the third row and fourth entry in the track table, and the value of the offset (BNY) '3' is written to the third row in the track table. 4 In the entry, '5
  • the scanner 108 calculates that the target address of the second branch instruction is '1654
  • the tag bit and the index bit are consistent with the foregoing values
  • the value of the block offset 306 is '3'
  • the value of the offset 303 is '5'.
  • the 14th of the path group 0 in the active table is selected by the above method.
  • the valid bit of entry 2 is '0', indicating that the branch instruction is not in the level one instruction cache 112, then the path group number in the active table is added to the target instruction address.
  • the index bit 307 of 312 is written into the track table as a secondary block number (BNX2) and the block offset 306 and offset (BNY) 303 values, ie 0
  • ' 0 ' indicates that the instruction corresponds to the way group 0 of the active table
  • ' 14 ' indicates that the target instruction is in the 14th of the corresponding active list.
  • Line, '3' indicates that the instruction is in the third entry of the corresponding active table, and '5' indicates that the instruction corresponds to the fifth instruction in the primary instruction block.
  • the scanner 108 calculates the target address of the third branch instruction is ' 3546
  • the foregoing method cannot successfully match any one of the active tables, indicating that the instruction is not in the secondary cache, and the corresponding instruction block is taken into the secondary cache 106 according to the target address, according to the LRU. Replace the algorithm and fetch the instruction block into the 14th row and 2nd entry in the L2 of the L2 cache.
  • the replacement algorithm can also use existing algorithms such as a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm (Rand).
  • FIFO first in first out algorithm
  • LRU least recently used algorithm
  • Rand random replacement algorithm
  • the value stored in the track point is read out as '5
  • the target instruction address of a branch instruction is '1654
  • the instruction has been executed, indicating that the instruction has been populated into the level one instruction cache 112.
  • the read pointer of the tracker 114 points to the third row and the seventh entry of the track table
  • the value stored in the track point is read out' 0
  • the entry indicates that the primary block number BNX1 stored in the entry is valid. Then according to the first block number BNX1 directly from the first level cache The instruction is read in the row and no longer needs to be read from the secondary cache.
  • the primary block number value '9' stored in the entry is written in the third row and the seventh entry of the track table, that is, on the third row of the track table 110.
  • the 7 item stores a value of '9
  • the processor core 116 can directly from the first instruction cache 112 The instructions are read directly from the 9 lines for use by the processor core 116.
  • the value stored in the track point is read out' 1
  • the primary block number BNX1 stored in the second entry of the row is invalid, indicating that the corresponding branch target instruction is not in the primary instruction cache 112. Therefore, it will be stored in the secondary cache 106
  • the corresponding first-level instruction block is filled into the first-level block number determined by the replacement algorithm.
  • the value of BNX1 is 38.
  • the first-level instruction block pointed to is stored in the second-level instruction cache 106.
  • the corresponding one-level instruction block is filled in the 38th line of the first-level instruction cache 112, and the value '38' is written into the 14th row and the second entry of the way group 1 of the active table, and the active table 104
  • the valid bit of the 14th row and 2nd entry of the middle group 1 is set to '1', and the value '38
  • the active table and track table update are completed.
  • the replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm ( Random ) and other existing algorithms.
  • FIFO first in first out algorithm
  • LRU least recently used algorithm
  • Random random replacement algorithm
  • a storage domain of the road group number in the secondary block number of the previous two-level instruction block of the sequential address of the secondary instruction block corresponding to the secondary instruction block corresponding to the entry may be added to the entry of the active table.
  • P the storage domain of the road group number in the secondary block number of the secondary instruction block after the sequential address N .
  • the corresponding secondary block number reads out the path group number of the corresponding previous or next secondary instruction block from the active table, and the result of the path group number corresponding to the index bit corresponding to the examined branch instruction is decreased by one or increased by one.
  • the splicing can obtain the corresponding secondary block number of the previous or next secondary instruction block, thereby avoiding the operation of sending the branch target instruction address to the active table for matching.
  • the scanner reviews a level one instruction block (referred to as a current level one instruction block), if the current level one instruction block is in the second level instruction block (referred to as the current level two instruction block) The last level one instruction block, then the end track point corresponding to the current level one instruction block is established as before. If the secondary instruction block (hereinafter referred to as the next secondary instruction block) where the next level one instruction block is located in the sequential address of the current level one instruction block is already stored in the second level buffer, the subsequent second level instruction block is directly corresponding to The second block number is filled in the end track point as the track point content; if the latter second level instruction block is not yet stored in the second level cache, the latter second level instruction block is filled as described above.
  • a current level one instruction block if the current level one instruction block is in the second level instruction block (referred to as the current level two instruction block) The last level one instruction block, then the end track point corresponding to the current level one instruction block is established as before. If the secondary instruction block (hereinafter referred to as the
  • the position in the secondary cache determined by the replacement algorithm, and the corresponding secondary block number is filled into the ending track point as the track point content.
  • the secondary block number of the second level instruction block of the current secondary instruction block is the second block number of the second level instruction block, and the path group in the second level block number may be used.
  • the storage area in the active table entry pointed to by the secondary block number corresponding to the current secondary instruction block (referred to as the current secondary block number) N
  • the secondary block number of the previous secondary instruction block of the subsequent secondary instruction block is the current secondary block number, and the path group number in the secondary block number can be used as the storage domain.
  • the content is filled in the storage domain in the active table entry pointed to by the secondary block number corresponding to the second level of the instruction block. P.
  • the index bits differ by ' 1 ', therefore, the index bit of the secondary instruction block address can be decremented by one and incremented by one, thereby obtaining the index bit value of the previous two-level instruction block and the latter second instruction block of the sequential address of the second-level instruction block, and
  • the content stored in all the way groups of the corresponding location is read from the active table according to the calculated index bit value. Then compare all the tags in the read content with the tags of the current secondary instruction block.
  • the path group number in the matching entry is Can be used as the storage domain content to fill the storage domain in the active table entry pointed to by the current secondary block number.
  • the road group number in the current secondary block number is filled as the storage domain content to the storage domain in the matching entry.
  • the path group number in the matching entry may be filled as the storage domain content to the active table table pointed to by the current secondary block number. Storage domain in the item N And filling the path group number in the current secondary block number as the storage domain content into the storage domain P in the matching entry.
  • FIG. 5 is another embodiment 500 of the second level cache of the present invention in the form of a two-way group.
  • the target instruction address 312 is described using a portion of the full instruction address.
  • a level one instruction block contains 4 instructions, so the offset 303 in the instruction line address 312 has 2 Bit, with this offset, determines the position of an instruction in the level one instruction block, called BN1Y.
  • the track table has 128 rows, then the first block number BN1X (that is, the aforementioned BNX1) has 7 Bit, whose value is determined by the line number in which it is located.
  • BN1X is spliced and BN1Y is called BN1, so that the position of the instruction in the level one instruction cache 112 can be determined.
  • block offset 306 is 2 bits.
  • Block offset 306 The stitching offset 303 is called BN2Y.
  • the index bit 307 is The 10-bit number plus the corresponding road group number is called the secondary block number BN2X (consistent with the aforementioned BNX2).
  • the structure of this embodiment is basically the same as that in FIG. 4, and the only change is the active table 104.
  • Each row in the table adds an entry that stores the address of the previous instruction block and the address of the next instruction block of the instruction block represented by the row, and a selector that serves the above-mentioned new entry.
  • Each row in the left-hand array (representing a L2 cache block) stores four entries of the four L1 cache block addresses in the row, in addition to the original storage tag entry 118 in FIG.
  • an entry 501 storing the address of the previous L2 cache block in the order of addresses and an entry 503 storing the next L2 cache block address in the order are added.
  • the output of the left array, entry 408 The output is still selected by the original selector 521, the output of the selector 521 and the output of the newly added entries 501, 503 are additionally added by the selector 531.
  • the right side array adds an entry 501 storing the previous L2 cache block address, and an entry 503 storing the next L2 cache block address, and a selector 532 corresponding to the selector 531. .
  • comparator 420 controls a three-state gate to put the output of selector 531 on the bus to the track table.
  • the memory is stored in 110; the comparator 422 controls another tri-state gate to put the output of the selector 532 on the same bus and store it in the track table 110.
  • Label 118 , label 120 The result of comparison with the input address determines which selector output (which instruction address) is sent to the track table 110 for storage.
  • the index address of the previous or next secondary instruction block of the current secondary instruction block can be decremented by 1 or plus 1 by the index address of the current secondary instruction block (ie, 307 in Figure 4). Obtained, then the entries 501, 502 of the previous block address and the entries 503, 504 of the last block address are added. In this case, it is only necessary to store the way number of the way group of the previous or next secondary instruction block of the current secondary instruction block.
  • the 'branch source instructions' are all direct branch instructions unless otherwise specified.
  • the scanner 108 pairs from the secondary cache 106 to the primary cache 112
  • the secondary instruction sub-block is reviewed.
  • the branch target address of the branch source instruction is calculated.
  • the scanner 108 In order to reduce power consumption, that is, to reduce the number of accesses of the active table 104, in the scanner 108 The method for determining whether the location of the branch target instruction exceeds the level of the first instruction block, the current level of the second instruction block, and the boundary of the previous or next level of the second instruction block of the current level 2 instruction block reduces the access to the active table. Frequency of.
  • the address boundary determination of the branch target address is determined by adding the branch address offset to the lower order of the base address.
  • the branch offset (OFFSET) 571 is added to the base address low 581, and the carry signals are extracted from the three boundaries of the adder (574, 575, and 576).
  • the three signals are processed with priority logic such that an effective 'in-bound' signal representing the largest block of data would invalidate the in-boundary signal representing the smaller block of data.
  • the base address lower bit 581 is divided into three parts, and the first part is the offset of the base address 311.
  • the second part is the block offset 306, and the third part 579 is one bit higher than the block offset 306 in the address.
  • the branch offset 571 is divided into two parts, and the lower part 573 corresponds to the base address 311
  • the lower part of the 581, the remaining part is the high part 572.
  • the generated sum value 582 is divided into three parts according to the same boundary as the base address, and a carry signal 574 is generated on each boundary. 575 and 576.
  • the method for determining the address boundary judgment is as follows:
  • branch portion offset 571 has the upper part 572 all '0' and the carry signal 576 is '1' ', indicating that the branch target address is outside the second level of the instruction block of the second instruction block where the branch source instruction is located. This situation is consistent with case 1, also known as case 1.
  • the address boundary determination can also be determined according to the foregoing method. The difference is that it is first determined whether the upper portion 572 of the branch offset 571 is all '1'. If branch offset 571 The upper part 572 is not all '1', which is the first case; if the high part 572 of the branch offset 571 is '1', and the carry signals 574, 575 and 576 Both are '0', which is the case 2; if the upper part 572 of the branch offset 571 is '1', and the carry signal 574 is '1', and the carry signals 575 and 576 It is '0', which is the above case 3; if the upper part 572 of the branch offset 571 is '1', and the carry signal 575 is '1', and the carry signal 576 is '0' ', that is, the above case 4; if the upper part 572 of the branch offset 571 is all '1', and the carry
  • the branch target instruction address is calculated by the BN1X base address and the PC address of the instruction segment temporarily stored in the scanner, and the calculated branch target location is as follows.
  • the scanner 108 reviews the address boundary determination condition 1
  • the scanner 108 The calculated branch target instruction address is sent to the active table 104 via the bus 507, the corresponding row is read according to the index bit therein, and the read tag and the scanner 108 are read.
  • the calculated label of the branch target instruction address is matched. If the matching is successful, the subsequent operations are consistent with the foregoing. If the matching is unsuccessful, the calculated branch target address is taken from the lower level memory and the corresponding instruction block is filled into the second level cache block determined by the replacement policy, and the subsequent operations are consistent with the foregoing.
  • the scanner 108 examines the address boundary judgment condition 2
  • the branch target address and the branch source address are in the same level one instruction block, that is, the branch target instruction and the branch source instruction have the same BN1X.
  • force the tristate gates of all the way groups such as the three-state gate 541 And so on, and the branch source BN1X stored in the scanner and the calculated offset 303 (i.e., branch target BN1Y) are merged into BN1 and written by the bus 505 into the scanner 108.
  • the processor 116 can directly cache from the level 1 when the branch source instruction is to be executed.
  • the direct read instruction is used by the processor 116.
  • the scanner 108 examines the address boundary judgment situation 3 When the branch target address and the branch source address are in the same level two instruction block, that is, the branch target instruction and the branch source instruction have the same BN2X. At this point, the BN2X of the instruction block where the branch source instruction stored in the scanner is located
  • the second storage block (such as the second storage block 408 or 410) in the corresponding entry of the active table 104 is read out via the bus 507 index (including the way group number and the index bit), and the calculated block offset 306 is used ( Block-offset) selects the content of the corresponding storage domain in the second storage block.
  • branch source BN2X Forcing the branch source BN2X if the BN1X value stored in the storage domain is valid
  • the tri-state gate of the road group corresponding to the middle road group number is turned on, the tri-state gate of the other road group is turned off, the BN1X value is sent to the track table 110 via the bus 508, and the calculated branch target BN2Y is also calculated.
  • the branch target BN1Y obtained after the block offset 306 is removed is sent to the track table 110 via the bus 505, and the two are merged into a branch target.
  • BN1 is written into the track table 110 by the scanner 108.
  • the branch source in the temporary storage BN1X and BN1Y points to the table entry.
  • BN1X value stored in the storage domain is invalid
  • the tristate gates of all the way groups are forced to be turned off and stored in the scanner 108.
  • the branch source BN2X and the calculated branch target BN2Y are merged into BN2 and written to the branch source BN1X temporarily stored by the scanner 108 in the track table 110 via the bus 505.
  • BN1Y points to the table entry. Subsequent operations are consistent with the foregoing.
  • the scanner 108 examines the address boundary judgment situation 4
  • the index bit value of the branch target instruction is different from the index bit value of the branch source instruction by ⁇ 1 '(The index bit value of the previous two-level instruction block is different from the index bit value of the branch source instruction' -1 ', and the value of the index bit of the latter two-level instruction block is different from the value of the branch source instruction index bit' +1' ').
  • the branch source BN2X (including the path group number and the index bit) stored in the scanner is read out via the bus 507 index to read the third storage block in the corresponding entry of the active table 104 (such as the third storage block 501, 502 or 502, 504), and according to the address boundary determination result, when the branch target address is in the previous two-level instruction block of the branch source address, select the corresponding storage domain P (such as the third storage block 501 or 50), when the branch target address is in the next secondary instruction block of the branch source address, select the corresponding storage domain N (such as the third storage block 503 or 504) Medium).
  • Number bus 508 The value of the new index bit obtained by decrementing or incrementing the branch source index bit stored in the scanner 108 and the calculated branch target BN2Y value are sent to the track table 110 via the bus 505. It is sent to the track table 110, and the two are merged into a branch target. BN2 is written into the branch source BN1X and BN1Y temporarily stored in the scanner 108 in the track table 110. Point to the table entry. If the path group number stored in the selected storage domain is invalid, the branch target address calculated according to the scanner 108 is sent to the active table via the bus 506. The index matching is performed, and the subsequent operations are consistent with the operation of the foregoing address boundary judgment case 1.
  • the frequency of reading the tag in the active table 104 compared to the address is reduced, but in cases 2 and 3 It is also necessary to directly find the 408, 410 entries in a row in the active table 104 in FIG. 5 by the way group number and the index address 307 to obtain the first instruction address in the same secondary instruction block, or 501. , the previous second address in the 502 entry, or the next second address in the 503, 504 entry. If the scanner 108 scans from the lower layer buffer 126 or 128 to the upper level buffer When the instruction block is filled, the above table entry in the active table (104) row corresponding to the instruction block (same group number and same index address as the above instruction block) is filled into the scanner 108.
  • Temporary storage can further reduce the frequency of access to the active table 104.
  • the register in the register has a plurality of independent read ports, and the plurality of branch instructions in the instruction segment being scanned can simultaneously determine the situation according to the address boundary of the respective branch target instruction.
  • the BN1 or BN2 form address of the instruction branch target is independently mapped by accessing the read port assigned to the instruction for storage in the track table 110.
  • Figure 6 is another embodiment 600 of a scanner constructed in a level two cache structure of the present invention.
  • the upper level buffer 112 Each instruction block contains 4 instructions, that is, the offset 303 BNY address is two bits; the lower layer buffer 126 or 128 each cache block contains 4 high-level cache blocks, that is, block offset 306 The address is also two.
  • One row of the track table 104 corresponds to a lower layer cache block, and the row contains four entries of the BN1X address stored in 408, and such as 501 An entry for storing the path group number of the previous lower layer address block, and an entry for storing the path group number of the next lower layer address block as 503.
  • the instructions are to the upper level buffer 112.
  • the scanner 108 also includes a miniature active block 660. The entire scanner 608 can be substituted into the scanner 108 of Figure 5, and the other top structures are the same as in Figure 5. Only the track table 110 is shown in the figure.
  • 660 also contains 624, 625 and 626 3 entries, where 624 entries store the path group number of the previous lower level cache block of the 501 entry in 104, and 625 entries store the path group number and index address of the current lower layer cache block, 626
  • the entry stores the path group number of the next lower-level cache block in the 503 entry in 104.
  • the 625 entry stores the secondary address path group number and index address of the instruction block being scanned. And read into the scanner 608 at the same time as the instruction block.
  • selectors in the micro active block 570-574 of which 4 570-573
  • the same structure according to the corresponding decoding and judging the address boundary generated by the sub-block to determine the content of the selection table item 630-636 directly or after the operation to generate a BN1X or BN2X address, together with an adder such as 607
  • the calculated address offset 303 is stored as a branch target address of the scanned instruction in the entry corresponding to the scanned instruction in the track table.
  • 5th selector 574 selection entry 630-636 The content fills the end track point in the track, and its selection control is different from that of selectors 570-573.
  • the instruction decoder in the sub-block decodes the instruction it is responsible for. If the instruction type is not a branch instruction, the instruction type generated by the sub-block decoding is stored in the track table and corresponds to the instruction. The entry of the table, the scanner does not calculate the branch address for the instruction. If the instruction type is a branch instruction (this instruction is hereinafter referred to as a branch source instruction), Then, the sub-block generates an address boundary judgment as in the previous example, for selecting the branch target address, and filling the track table together with the instruction type generated by the decoding. The entry corresponding to the branch source instruction. The following example shows the case when the sub-block decoded instruction is a branch instruction.
  • each sub-block first performs a total of '0' in the branch offset 571 of the instruction. 'The judgment, if not all '0', the address boundary is judged as case 1.
  • the instruction branch offset is added to the base address of the instruction.
  • the base address is the temporary buffer in the scanner (from the active table 104 In the 408 entry, the index value (that is, 307 in the 625 entry), the block offset 306, and the offset 303BNY merge.
  • the instruction block has 4 instructions each with its first base address. The parts are the same, only BNY is different.
  • the BNY of the first instruction in the order of instructions is '0', and the BNY of the following three instructions is '1', '2', '3 '.
  • the summed sum is the memory address of the branch target, and each row in the left and right arrays in the active list 104 is read with the index portion 307 in this memory address as the address. Offset by block in memory address 306
  • the control selector 521 selects the BN1X stored in one of the four entries 408 in the row, and is selected by the selector 531 (fixed selection in the address boundary judgment case 1 521) The output is sent to the tri-state gate 541.
  • the label entry in the row 118 and the label portion 311 in the memory address of the branch target are in the comparator 420 In the comparison, if the result is the same, the same comparison result enables (enable) the three-state gate 541, the output of the tri-state gate 541 and the 303 offset in the memory address BNY The corresponding entry to the scanned instruction in the track table is merged and stored. If it is the label entry 120 of the right array and the label portion 311 of the memory address of the branch target is at the comparator 422 If the result of the comparison is the same, the BN1X that is sent to the address stored in the track table is from the entry 410. The principle is the same and will not be described again. The following describes the case where the high bit in the branch offset is all '0'.
  • Each of the decoding and decision sub-blocks will have a branch offset 571 of the branch instruction responsible for processing and a block offset 306 in the instruction base address.
  • offset 303 is added by an adder in the sub-block such as 607 (the higher bit in the base address such as index bit 722, label 721 Deprecated).
  • Each sub-block generates an address boundary judgment according to the carry signal generated at the time of addition according to the foregoing method, and determines a generation control signal according to the address boundary to control the selector to select an appropriate memory entry 620-626 The value in is used to populate the track table.
  • the offset 303 (The offset 303 is '0' for the first instruction in the sequence) is added in the adder 607. If the address boundary of the sub-block is determined as the case 1 Then, as described above, the memory address of the branch target is calculated by the scanner 608 to be sent to the active table 104 and mapped to the level 1 cache address BN1 and stored in the branch source instruction corresponding entry in the track table.
  • the address boundary judges the block offset in the sum generated by the adder 607. 306 Place control line 610 to control selector 670. If the block offset 306 is '00', the selector 670 selects the storage entry 620. In the content, if the valid bit of the content is 'valid', the selector 670 outputs the BN1X address in the storage entry 620; when the valid bit in the content in the storage entry 620 is 'invalid', the selector 670 Outputs the path group number stored in the storage entry 625, index bit 307.
  • the path group number of the output, the index bit 307 and the block offset 306 in the sum generated by the adder 607, the offset 303 (BNY) is merged and sent to the first entry in a track in track table 110.
  • the track is the corresponding track of the level one cache block in the level one buffer into which the instruction block being scanned is stored.
  • Adder 607 When the generated block offset 306 is '01', '10', and '11', the selector 670 selects the storage entries 621, 622, and 623 accordingly. If the content is invalid, then the entry 625 is selected, which is the same as above.
  • control line 610 controls the selector. 670 Select storage table entry 624 Read the path group number in it, select the storage table entry 625 and read out the index 307. The upper block group number of the entry 624, the index in the entry 625 307 Subtracting '1', the block offset 306 generated by the adder 607, and the offset 303 are merged into the first entry in the above track in which the BN2 address is stored.
  • the control line 610 controls the selector 670 to select the storage table entry 626 to read out the path group number therein, select the storage table entry 625, and read the index therein. 307. With the lower block group number of the entry 626, the index 307 in the entry 625 is incremented by '1', the block offset 306 of the sum generated by the adder 607, and the offset 303 are merged into The BN2 address is stored in the first entry in the above track.
  • the sub-blocks also operate independently on the respective instructions in the above manner, independently determine the address boundary of the instruction, and control the selectors 671, 672 via the control lines 611, 612, 613 according to the judgment. , 673 Select the contents of the storage table item 620-626, and fill in the 2, 3, and 4 items in the track, respectively, in response to the sum generated by the adder in the sub-block.
  • the last entry in the track, the end track point, is filled by the output of selector 674.
  • the selector is directly controlled by the block offset 306 in the base address 614 of the instruction segment.
  • the selector 674 selects the storage entry 621.
  • the selector 674 When storing the entry 621
  • the selector 674 When the valid bit in the field is 'valid', the selector 674 outputs the BN1X address in the storage entry 631; when the valid bit in the contents of the storage entry 621 is 'invalid', the selector 674
  • the path group number stored in the storage table entry 625 is output, index bit 307.
  • the output is offset from the block offset 306 generated by adder 607 by '1', offset 303 (BNY) After the merge, it is sent to the end entry in a track in the track table 110.
  • the selector 674 selects the storage entry accordingly. For the contents of 622 and 623, if the content is invalid, the entry 625 is selected, which is the same as above.
  • the selector 674 selects the storage entry 626 Read the path group number and select the storage entry 625 to read the index 307.
  • the index 307 in entry 625 is incremented by '1', the adder 607
  • the generated block offset 306, the offset 303 is merged into the end track point entry in which the BN2 address is stored in the above track.
  • the active list 104 It can also be configured by multi-port read/write to achieve simultaneous access to the active table by multiple branch target addresses.
  • Figure 7 shows the memory and format used in the micro-track table organized in a fully associative manner.
  • Figure 7A is the structure of a memory 820 in a fully associative micro track block.
  • the memory 820 contains six entries, corresponding to a secondary instruction block containing four primary instruction blocks.
  • the entry 710 There is a first-order instruction block number BN1X corresponding to the first-order instruction block whose displacement in the block is '00' in the second-level instruction block and its valid signal; the entries 711, 712, and 713 respectively have the intra-block displacements respectively. 01 ', ' 10 ', '11 '
  • Entry 714 contains the way group number ( Way number ) and index address 307 ( Index ).
  • Entry 715 stores the path group number of the lower level L2 cache block.
  • module 110 is a track table
  • module 808 For the scanner, the scanner 108 in Figure 5 can be referred to.
  • the function block 801 is similar to the instruction decoding and judging module 601 in the embodiment of FIG. A function block for performing independent instruction decoding and calculating a branch target address for a plurality of instructions in an instruction block of the scanner.
  • the function block 801 sets each decoding result as the instruction base address of the branch instruction (Base Address, as described in Figure 6, the high order of the base address of the complex instruction is the same, but in this case the lowest two bits of the base address differ from the position of the instruction in the instruction block) and the branch offset of the instruction ( Branch Offset, which is the branch address offset, is added, and the sum is the branch target address, and the selection of the content of the micro active block 881 is controlled by this address.
  • Base Address as described in Figure 6, the high order of the base address of the complex instruction is the same, but in this case the lowest two bits of the base address differ from the position of the instruction in the instruction block
  • Branch Offset which is the branch address offset, is added, and the sum is the branch target address, and the selection of the content of the micro active block 881 is controlled by this address.
  • these branch target addresses can be divided into 4 In part, the descending order is from the high position to the low position, which are a micro-tag portion ( Tag ) 721 , a micro-index ( Index ) 722 , and a block offset ( Block Offset ) 306 . And the offset 303.
  • Micro-label 721, micro-index 722 is different from label 311, index 307 in other embodiments of the present disclosure. Where the micro index 722 There are only two digits, because each micro active block contains only 4 active table rows corresponding to the second level instruction block, and there are 4 level one instruction blocks in the corresponding level one instruction block, and the micro index value is equal to the active table index 307 The lowest two.
  • Microlabel 721 contains labels 311 and active table index 307 bits other than the lowest two bits.
  • the first three parts 721, 722 and 306 are via buses 810, 811, 812, 813 is sent to each micro active block (such as micro active block 881, 883) to control the selector; offset 303 is combined with the corresponding selector output BNX into a complete BN
  • the address is to populate the entries in the track table 110.
  • the micro-active block 881 contains memories 820, 821, 822, 823 and multiplexers that store track table entries. 870, 871, 872, 873, 874.
  • the memory such as memory 820 is the structure in Figure 7A.
  • the micro active block 881 has a microtag register 851 in which the micro active block 881 is stored. The base address of a consecutive instruction corresponding to the active table entry stored in it. There are also 4 comparators 860, 861, 862, 863. One input and register of the four comparators 851 The output is connected, and the other input is connected to the above four branch target addresses 810, 811, 812, and 813, respectively. 4 branch destination addresses 810, 811, 812, The 813 is sent to the micro active blocks 881, 883 (the same structure as the micro active block 881) and compared to the micro tags in the microtag registers. In the micro active block 881, the branch destination address is set.
  • the 810 micro-label portion 721 is compared by the comparator 860 and is the same as the micro-tag in the micro-register 851. Comparator 860 with branch target address 810 micro index 307 and block offset 306 Control selector 870.
  • the micro index 307 selects one of the four memories. When the micro index is '00', select 820, and when the micro index is '01', '10', ' 11 ', select memory 821, 822, 823 respectively.
  • Block offset 306 selects 4 groups of BN1X from the selected memory.
  • selector 870 When the valid bit in the selected group is 'valid', the selector 870 outputs the BN1X address in the selected group; when the valid bit in the selected group is ' Invalid ', selector 870 Output Memory 820
  • the output is coupled to the same output from another micro active block 883 via an OR gate 840 or operated, and combined with an offset 303 from adder 607 to the track table 110.
  • the first entry in the track pointed to by address bus 505 is written.
  • the micro tag portion 721 in the branch target address 811 is set via the comparator 861. In comparison, the result is different from the micro-tag in the micro-tag register 851, at which time the comparator 861 sends a signal control selector 871 to output all '0'. 'Output so that it does not affect the corresponding output in other micro active blocks (such as micro active block 883).
  • the branch target 811 is sent to the active table.
  • the branch target address is read. 811 The table entry is read. The entry is filled in the track table by the address bus. The second entry on the track pointed to by 505.
  • the remaining two branch target instruction addresses 812, 813 each control selectors 872, 873 select 16 BN1 1; or the way group number and index bit 307, together with the block offset 306 on the target instruction address; or all '0' output.
  • the output is merged with the corresponding BN1Y, with the micro active block 883 After the corresponding output is performed or operated, it is sent to the track table 110 to write the 3, 4 entries of the above track storage. If an instruction is not a branch instruction, the instruction decode controls the corresponding comparator of the instruction without comparison, such as an instruction.
  • the non-branch instruction type generated by the instruction decoding is stored in the third entry of the above track in the track table 110.
  • the lower block address stored in the end track point in the track is provided by a similar comparison by a decoding selection function.
  • Selector 874 and memory The connection method of the 820, etc. is different from that of the selector 870-873. Under the same address control, the selector 874 selects the input of the next address in the order of 870-873. If the microindex 722 of the address and the block offset 306 bit are '0000', the selector 870-873 selects the entry 710 in the memory 820, but according to the same address selector 874 The entry 711 in the memory 820 is selected; if the micro index 722 of the address and the block offset 306 are '0011', the selector 870-873 selects the entry in the memory 820.
  • the selector 874 selects the entry 710 in the memory 821. If the micro index of the address and the block offset 306 are '1111', it is special, the selector 870-873 The entry 713 in the memory 823 is selected, but the selector 874 selects the way group number in the entry 715 in the memory 823 and the second instruction block number in the entry 714 plus '1'. Together with the block offset 306 in the address as the lower block address.
  • the micro-tag 721 in the block address 814 i.e., the base address of the instruction block being processed) is sent to each micro-active block for comparison with the micro-tag stored therein.
  • the comparator 864 in the micro active block 881 compare the output of the pico and microtag registers 851 on the block address 814, and the comparison result is the same, then the comparator 864 takes the block address 814. Index 722 and block offset 306 on the top control selector 874.
  • the entry is output.
  • selector 874 selects the way group code in table entry 724 in the memory 823, index address 307 and block offset 306 on address 814. Output together.
  • the address format 760 is a level 1 cache address format, which is composed of BN1X 761 and offset BNY 303. Where address format 780 It is a secondary cache address format consisting of way group number 781, index address 307, block offset 306, and offset BNY 303.
  • micro-tag in 814 does not match all the micro-active blocks (such as micro-active blocks 881, 883) in the scanner 800, and the branch target address 811 is sent to the active table in this embodiment.
  • the row pointed to by the branch target address 811 can be padded to the permutation logic (such as the LRU, by the scanner 800).
  • the memory specified by the micro index bit 722 in the branch target 811 in a designated micro active block is replaced by the original entry, such as when the micro index bit is '10'.
  • the method is to fill in four BN1Xs and their valid signals in a row in the active table 104 pointed to by the branch target 811 into the entry. 710, 711, 712, 713; the path group number and index number 307 of the active table row are filled in the entry 714 as the L2 cache block number of the block; the lower block entry in the active table row is 503.
  • the road group number in the middle is filled in the entry 715.
  • the micro-tag in the branch target 811 is stored in the micro-tag register 851 in the micro-active block 883; and the memory 820, 821, 823
  • the valid position in is 'invalid'. Thereafter, each entry in the memory 820, 821, 823 can be updated during the period in which the active list is not accessed.
  • the permutation logic can specify a micro-active block as a permutation object according to a specific algorithm.
  • LRU For example, in each micro active block, there is stored a count value with complex bits whose lowest bit is on the right. The count value is shifted 1 bit to the left whenever any of the comparators in the block match, and is filled with '1' at the lowest bit. '.
  • the replacement logic observes the count value in all the blocks. If the lowest bit of any one of the count values is '0', the micro active block where the count value is located is the replaced object.
  • the replacement logic controls the count values in all the micro active blocks to be shifted to the right by one bit until one of the lowest value of the count value is '0', that is, the micro active block where the count value is located is the replaced object.
  • the present invention can also support the scanner 108 with a micro-active block of a group-connected structure. All instructions in one instruction block being scanned are simultaneously address mapped.
  • the micro-active block of the group connected structure is similar in structure to a reduced active table 104, such as the number of columns, but the list is the same but only 8 rows, and there are 4 A read port corresponds to a maximum of 4 instructions in an instruction block. Each read corresponds to an entry in the track table 110.
  • the selectors 521, 531, the comparator 420, and the three-state gate 541 in FIG. And so on are 4 sets.
  • the four branch addresses of the four branch instructions are used to address the micro active blocks of the group connected structure.
  • the read port reads out 8 lines of contents, 8 of which are BN1X addresses each of which has a block offset of 4 branch addresses 306. One is selected from each group; 8 micro-labels (compared to label 311) Long, including the bits other than the lowest 3 bits in index 307) are compared with the micro-tags in the 4 branch addresses in 8 comparators.
  • One of the two channels in the same way to compare the results of the same drive 3
  • the state gate writes the BN1X selected by the above 306 in the path of the read port to the entry corresponding to the read port in the track table.
  • Each of the four read ports writes one entry in the track.
  • the apparatus and method proposed by the present invention can be used in various cache related applications, and the efficiency of the cache can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A system and a method for caching a high-performance instruction are applied to the field of processors, which can fill, before a processor core executes an instruction, the instruction in a high-speed memory that can be directly accessed by the processor core, so that the processor core can acquire a needed instruction from the high-speed memory almost each time, thereby achieving a high cache hit rate.

Description

高性能指令缓存***和方法  High performance instruction cache system and method 技术领域Technical field
本发明涉及计算机,通讯及集成电路领域。 The invention relates to the field of computers, communications and integrated circuits.
背景技术Background technique
通常缓存的作用是将更低级存储器中的部分内容复制在其中,使这些内容能被更高级存储器或处理器核快速存取,以保证流水线的持续运行。 Usually the role of the cache is to copy part of the lower-level memory in it, so that the content can be quickly accessed by higher-level memory or processor core to ensure the continuous operation of the pipeline.
现行缓存的寻址都基于以下方式,用地址中的索引段寻址读出标签存储器中的标签与地址中的标签段进行匹配;用地址中索引段与块内位移段共同寻址读出缓存中的内容。如果从标签存储器中读出的标签与地址中的标签段相同,那么从缓存中读出的内容有效,称为缓存命中。否则,如果从标签存储器中读出的标签与地址中的标签段不相同,称为缓存缺失,从缓存中读出的内容无效。对于多路组相联的缓存,同时对各个路组并行进行上述操作,以检测哪个路组缓存命中。命中路组对应的读出内容为有效内容。若所有路组都为缺失,则所有读出内容都无效。缓存缺失之后,缓存控制逻辑将低级存储媒介中的内容填充到缓存中。 The addressing of the current cache is based on the following method: the index in the address tag is used to address the tag in the tag memory to match the tag segment in the address; the index segment in the address is used to address the read buffer together with the segment in the block. In the content. If the tag read from the tag memory is the same as the tag segment in the address, then the content read from the cache is valid, called a cache hit. Otherwise, if the tag read from the tag memory is not the same as the tag segment in the address, it is called a cache miss, and the content read from the cache is invalid. For the cascading cache, the above operations are performed in parallel for each way group to detect which way group cache hits. The read content corresponding to the hit path group is valid content. If all the way groups are missing, all readings are invalid. After the cache is missing, the cache control logic populates the contents of the low-level storage medium into the cache.
技术问题technical problem
现有技术中,受限于功耗及速度限制(如因为多路组相联缓存结构要求将所有路组由同一索引寻址的内容及标签同时读出并比较),为了达到更高的性能,一般采用低层次缓存路组数大于高层次缓存路组数的组成方式。此外,缓存缺失可分为三类状况:强制缺失、冲突缺失和容量缺失。在现有技术中,除了预取成功的小部分外,强制缺失是不可避免的。 In the prior art, power consumption and speed limitation are limited (for example, because the multiplexed associative cache structure requires that all the channels and the labels addressed by the same index are simultaneously read and compared), in order to achieve higher performance. Generally, the composition of the low-level cache path group is larger than the number of high-level cache path groups. In addition, cache misses can be divided into three categories: forced misses, missing conflicts, and missing capacity. In the prior art, forced deletion is inevitable except for a small portion of prefetch success.
现代的缓存***通常由多路组相连的多层次缓存构成。新的缓存结构,如:牺牲缓存、跟踪缓存以及预取等都是基于上述基本缓存结构并改善上述结构。然而,随着日渐扩大的处理器 / 存储器速度鸿沟,现行体系结构,特别是多种缓存缺失,已成为是制约现代处理器性能提升的最严重瓶颈。 Modern cache systems are typically composed of multi-level caches connected by multiplexes. New cache structures, such as victim cache, trace cache, and prefetch, are based on the basic cache structure described above and improve the above structure. However, with the growing processor / Memory speed gap, the current architecture, especially the lack of multiple caches, has become the most serious bottleneck restricting the performance of modern processors.
技术解决方案Technical solution
本发明提出的方法与***装置能直接解决上述或其他的一个或多个困难。  The method and system apparatus proposed by the present invention can directly address one or more of the above or other difficulties.
本发明提出了一种高性能指令缓存方法,其特征在于,处理器核连接一个包含可执行指令的第一存储器、一个比第一存储器速度更快的第二存储器;所述方法包括: 对正被从第一存储器填充到第二存储器的指令进行审查,从而提取出至少包括分支信息的指令信息; 根据提取出的指令信息建立复数条轨道; 根据复数条指令轨道中的一条或多条轨道将至少一条或多条指令可能被处理器核执行的指令从第一存储器填充到第二存储器; 所述方法进一步包括,第二存储器采用全相联的方式构成,第一存储器由组相联的方式构成。 The present invention provides a high performance instruction cache method, characterized in that the processor core is connected to a first memory containing executable instructions and a second memory faster than the first memory; the method comprises: Examining an instruction that is being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information; and establishing a plurality of tracks according to the extracted instruction information; Encapsulating at least one or more instructions that may be executed by the processor core from the first memory to the second memory according to one or more tracks in the plurality of instruction tracks; The method further includes the second memory being constructed in a fully associative manner, the first memory being constructed in a group associated manner.
可选的, 在所述方法中,将轨道与第二存储器中的指令块一一对应。 Optionally, in the method, the track is in one-to-one correspondence with the instruction block in the second memory.
可选的, 在所述方法中, 通过一级块号对目标地址进行寻址,从而确定该目标指令是否属于第二存储器的某个指令块。 Optionally, in the method, The target address is addressed by the primary block number to determine if the target instruction belongs to a certain instruction block of the second memory.
可选的, 在所述方法中, 通过匹配,将二级块号写入轨道表,等到第一存储器中的指令填充到第二存储器中时,将其更改为一级块号。Optionally, in the method, By matching, the secondary block number is written to the track table, and when the instruction in the first memory is filled into the second memory, it is changed to the primary block number.
可选的, 在所述方法中, 对轨道进行扫描,一旦发现有对主动表块号的引用就将主动表对应块号的标志位置位;同时依次将主动表中各个块号的标志位复位,从而用已置位的标志位表示当前被轨道引用的块号,使之不会被替换出主动表。 Optionally, in the method, Scanning the track, once the reference to the active table block number is found, the active table corresponds to the block position of the block number; at the same time, the flag bits of each block number in the active table are sequentially reset, thereby indicating with the set flag bit The block number currently referenced by the track so that it will not be replaced by the active table.
本发明还提出了一种高性能指令缓存***,其特征在于,所述***包括: 处理器核,所述处理器核用以执行指令; 第一存储器,所述第一存储器用以存储所述处理器核所需指令; 第二存储器,所述第二存储器用以存储所述处理器核所需指令,且所述第二存储器的速度比所述第一存储器更快; 扫描器,所述扫描器用以对正被从第一存储器填充到第二存储器的指令进行审查,从而提取出至少包括分支信息的指令信息; 轨道表,所述轨道表用以存储根据提取出的指令信息建立的复数条轨道; 所述***进一步包括: 第二存储器采用全相联的方式构成; 第一存储器由组相联的方式构成。 The present invention also provides a high performance instruction cache system, characterized in that the system comprises: a processor core, the processor core is configured to execute an instruction; a first memory, the first memory is configured to store an instruction required by the processor core; a second memory, the second memory is configured to store instructions required by the processor core, and the second memory is faster than the first memory; a scanner, configured to review an instruction being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information; a track table, the track table is configured to store a plurality of tracks established according to the extracted instruction information; the system further includes: the second memory is configured in a fully associative manner; The first memory is constructed by a group association.
可选的, 在所述***中, 将轨道表中的轨道与第二存储器中的指令块一一对应。 Optionally, in the system, the tracks in the track table are in one-to-one correspondence with the instruction blocks in the second memory.
可选的, 在所述***中, 第二存储器中的每个指令块对应一个一级块号。 Optionally, in the system, each instruction block in the second memory corresponds to a first-level block number.
可选的, 在所述***中, 对轨道表进行扫描,一旦发现有对主动表块号的引用就将主动表对应块号的标志位置位;同时依次将主动表中各个块号的标志位复位,从而用已置位的标志位表示当前被轨道表引用的块号,使之不会被替换出主动表。 Optionally, in the system, Scanning the track table, once the reference to the active table block number is found, the flag position of the block number corresponding to the active table is set; at the same time, the flag bits of each block number in the active table are sequentially reset, thereby using the flag bit that has been set. Indicates the block number currently referenced by the track table so that it will not be replaced by the active table.
可选的, 在所述方法中, 当第一存储器中一个指令块对应的顺序地址的前一指令块或后一指令块已经存储在第一存储器中时,主动表中存储了该指令块对应的所述前一指令块或后一指令块在第一存储器中的存储位置信息。 Optionally, in the method, When the previous instruction block or the subsequent instruction block of the sequential address corresponding to one instruction block in the first memory is already stored in the first memory, the previous instruction block corresponding to the instruction block or the latter one is stored in the active table. The storage location information of the instruction block in the first memory.
可选的, 在所述方法中, 当指令位于第一存储器中当前指令块的前一指令块或后一指令块中时,可以根据存储在主动表中的所述前一指令块或后一指令块的位置信息,直接在第一存储器中找到该指令。 Optionally, in the method, When the instruction is located in a previous instruction block or a subsequent instruction block of the current instruction block in the first memory, the instruction may be directly in the first position according to the location information of the previous instruction block or the subsequent instruction block stored in the active table. The instruction is found in memory.
可选的, 在所述方法中, 对分支目标指令地址进行边界判断;根据所述判断结果,对位于不同位置的分支目标指令给予不同格式的地址。 Optionally, in the method, Boundary judgment is performed on the branch target instruction address; according to the judgment result, the branch target instruction located at different positions is given an address of a different format.
可选的, 在所述方法中, 若分支目标指令地址位于分支指令在第一存储器中所在指令块的前一或后一指令块中,则以该分支指令所在指令块的前一或后一指令块的二级块号作为该分支目标指令的二级块号,以该分支目标指令地址中对应第一存储器的地址偏移量部分作为该分支目标指令的偏移量。 Optionally, in the method, If the branch target instruction address is located in the previous or next instruction block of the instruction block where the branch instruction is located in the first memory, the secondary block number of the previous or next instruction block of the instruction block in which the branch instruction is located is used as the branch The secondary block number of the target instruction, with the address offset portion corresponding to the first memory in the branch target instruction address as the offset of the branch target instruction.
可选的, 在所述方法中, 将正被从第一存储器填充到第二存储器的指令对应的主动表内容存储在微型主动表中;若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、且该一级指令块在微型主动表中对应的一级块号有效时,直接以从微型主动表中读出的所述一级块号作为所述分支目标指令的一级块号;若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、但该一级指令块在微型主动表中对应的一级块号无效时,直接以所述分支指令的二级块号作为所述分支目标指令的二级块号;若审查发现分支目标指令位于分支指令的前一个或后一个二级指令块、且该前一个或后一个二级指令块在微型主动表中对应的二级块号有效时,直接以从微型主动表中读出的所述二级块号作为所述分支目标指令的二级块号。 Optionally, in the method, The active table content corresponding to the instruction being filled from the first memory to the second memory is stored in the micro active table; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, And the first-level instruction block directly uses the first-level block number read from the micro-active table as the first-level block number of the branch target instruction when the corresponding first-level block number is valid in the micro-active table; It is found that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but when the level one instruction block is invalid in the corresponding first level block number in the micro active table, directly The block number is used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction block of the branch instruction, and the previous or next secondary instruction block is in the micro active table When the corresponding secondary block number is valid, the secondary block number read out from the micro active table is directly used as the secondary block number of the branch target instruction.
可选的, 在所述方法中, 将复数个二级块号及这些块号在主动表中对应的内容存储在微型主动表中;在若审查发现分支目标指令时,首先将分支目标指令地址在所述微型主动表中匹配,若匹配成功,则直接以从微型主动表中读出的一级块号或二级块号作为所述分支目标指令的一级块号或二级块号;若匹配不成功,则再将分支目标指令地址送往主动表匹配。 Optionally, in the method, The plurality of secondary block numbers and the corresponding contents of the block numbers in the active table are stored in the micro active table; if the branch target instruction is found by the review, the branch target instruction address is first matched in the micro active table, if If the matching is successful, the first block number or the second block number read out from the micro active table is directly used as the first block number or the second block number of the branch target instruction; if the matching is unsuccessful, the branch target is further The instruction address is sent to the active table match.
可选的, 在所述***中, 主动表的表项与第一存储器中的指令块一一对应,每个表项存储了第一存储器中相应指令块的块地址;且当第一存储器中一个指令块对应的顺序地址的前一指令块或后一指令块已经存储在第一存储器中时,主动表中还存储了该指令块对应的所述前一指令块或后一指令块在第一存储器中的存储位置信息。 Optionally, in the system, The entries of the active table are in one-to-one correspondence with the instruction blocks in the first memory, each entry storing the block address of the corresponding instruction block in the first memory; and the previous one of the sequential addresses corresponding to one instruction block in the first memory When the instruction block or the subsequent instruction block is already stored in the first memory, the active table further stores storage location information of the previous instruction block or the subsequent instruction block corresponding to the instruction block in the first memory.
可选的, 在所述***中, 对分支目标指令地址进行边界判断;根据所述判断结果,对位于不同位置的分支目标指令给予不同格式的地址。 Optionally, in the system, Boundary judgment is performed on the branch target instruction address; according to the judgment result, the branch target instruction located at different positions is given an address of a different format.
可选的, 所述***包含单数个或复数个加法器;所述加法器用于对分支指令本身在第一存储器对应的偏移量以外部分中的低位与分支转移距离中的相应位相加,判断所述分支目标指令是否位于第一存储器中所述分支指令所在的指令块顺序地址的前一个或后一个指令块中;当分支目标指令位于第一存储器中当前指令块的前一指令块或后一指令块中时,可以根据存储在主动表中的所述前一指令块或后一指令块的位置信息,直接在第一存储器中找到该指令。 Optional, The system includes a singular or a plurality of adders; the adder is configured to add a lower bit of a branch instruction itself in a portion other than the offset corresponding to the first memory to a corresponding bit in the branch transfer distance, and determine the branch Whether the target instruction is located in the previous or next instruction block of the instruction block sequential address where the branch instruction is located in the first memory; when the branch target instruction is located in the first instruction block or the next instruction block of the current instruction block in the first memory In the middle, the instruction may be directly found in the first memory according to the location information of the previous instruction block or the subsequent instruction block stored in the active table.
可选的,所述***还包括微型主动表;所述微型主动表用于存储正被从第一存储器填充到第二存储器的指令对应的主动表内容;当扫描器若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、且该一级指令块在微型主动表中对应的一级块号有效时,直接以从微型主动表中读出的所述一级块号作为所述分支目标指令的一级块号;若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、但该一级指令块在微型主动表中对应的一级块号无效时,直接以所述分支指令的二级块号作为所述分支目标指令的二级块号;若审查发现分支目标指令位于分支指令的前一个或后一个二级指令块、且该前一个或后一个二级指令块在微型主动表中对应的二级块号有效时,直接以从微型主动表中读出的所述二级块号作为所述分支目标指令的二级块号。 Optionally, the system further includes a micro active table; the micro active table is configured to store active table content corresponding to an instruction being filled from the first memory to the second memory; when the scanner finds that the branch target instruction is located a different one-level instruction block in the same two-level instruction block of the branch instruction, and the first-level instruction block is directly read from the micro-active table when the corresponding first-level block number in the micro-active table is valid. The block number is used as the first block number of the branch target instruction; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but the level one instruction block is in the micro active table When the corresponding primary block number is invalid, the secondary block number of the branch instruction is directly used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction of the branch instruction When the block and the previous or next secondary instruction block are valid in the corresponding secondary block number in the micro active table, directly use the secondary block number read from the micro active table as the The secondary block number of the branch target instruction.
可选的, 所述***还包括微型主动表;所述微型主动表用于存储复数个二级块号及这些块号在主动表中对应的内容;当扫描器若审查发现分支目标指令时,首先将分支目标指令地址在所述微型主动表中匹配,若匹配成功,则直接以从微型主动表中读出的一级块号或二级块号作为所述分支目标指令的一级块号或二级块号;若匹配不成功,则再将分支目标指令地址送往主动表匹配。 Optional, The system further includes a micro active table; the micro active table is configured to store a plurality of secondary block numbers and corresponding contents of the block numbers in the active table; when the scanner detects the branch target instruction, the branch target is firstly The instruction address is matched in the micro active table. If the matching is successful, the primary block number or the secondary block number read out from the micro active table is directly used as the primary block number or the secondary block of the branch target instruction. Number; if the match is unsuccessful, the branch target instruction address is sent to the active table match.
有益效果Beneficial effect
本发明所述***和方法可以为数字***使用的缓存结构提供基本的解决方案。与传统缓存***仅在缓存缺失后才填充的机制不同,本发明所述的***和方法在处理器执行一条指令之前就对指令缓存进行填充,可以充分地隐藏强制缺失。此外,本发明所述***和方法在本质上是一个一级缓存采用全相联的结构,二级缓存采用组相联的结构,实质上达到了近似于全相联结构的效果,避免了容量缺失,同时也提高了处理器的运行速度。由于本发明所述的***和方法需要的匹配操作次数较少,并且缺失率较低,因此功耗也比传统缓存***显著降低。对于本领域专业人士而言,本发明的其他优点和应用是显见的。 The system and method of the present invention can provide a basic solution for the cache structure used by digital systems. Unlike the conventional cache system, which only populates after the cache is missing, the system and method of the present invention fills the instruction cache before the processor executes an instruction, and can fully hide the forced miss. In addition, the system and method of the present invention essentially adopts a fully associative structure for a level 1 cache, and the level 2 cache uses a group-connected structure, substantially achieving an effect similar to a fully associative structure, avoiding capacity. Missing, but also improve the speed of the processor. Since the system and method of the present invention requires fewer matching operations and a lower rate of misses, power consumption is also significantly lower than conventional cache systems. Other advantages and applications of the present invention will be apparent to those skilled in the art.
附图说明DRAWINGS
图 1 是本发明所述的二级缓存采用多路组形式构成的指令预取结构图。 FIG. 1 is a schematic diagram of an instruction prefetch structure constructed by using a multi-path group in a secondary cache according to the present invention.
图 2 是本发明所述循迹器读指针移动的实施例。 2 is an embodiment of the tracker read pointer movement of the present invention.
图 3 是是本发明所述一级指令块、二级指令块及相应存储单元之间关系的实施例。 3 is an embodiment of the relationship between the primary instruction block, the secondary instruction block, and the corresponding storage unit of the present invention.
图 4 是本发明所述的二级缓存采用二路组形式构成的具体实施例。 FIG. 4 is a specific embodiment of the secondary cache according to the present invention in a two-way group format.
图 5 是本发明所述二级缓存采用二路组形式构成的另一具体实施例。 FIG. 5 is another specific embodiment of the secondary cache of the present invention in the form of a two-way group.
图 6 是本发明所述二级缓存结构中扫描器构成的另一具体实施例。 6 is another specific embodiment of a scanner configuration in the second level cache structure of the present invention.
图 7 是按全相联方式组织的微型轨道表中使用的存储器及格式。 Figure 7 shows the memory and format used in the micro-track table organized in a fully associative manner.
图 8 是全相联的微型轨道表的一个实施例。 Figure 8 is an embodiment of a fully associative micro-track table.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
图 4 显示了本发明的最佳实施方式。  Figure 4 shows a preferred embodiment of the invention.
本发明的实施方式Embodiments of the invention
以下结合附图和具体实施例对本发明提出的高性能缓存***和方法作进一步详细说明。根据下面说明和权利要求书,本发明的优点和特征将更清楚。需说明的是,附图均采用非常简化的形式且均使用非精准的比例,仅用以方便、明晰地辅助说明本发明实施例的目的。 The high performance cache system and method proposed by the present invention are further described in detail below with reference to the accompanying drawings and specific embodiments. Advantages and features of the present invention will be apparent from the description and appended claims. It should be noted that the drawings are in a very simplified form and all use non-precise proportions, and are only for convenience and clarity to assist the purpose of the embodiments of the present invention.
需说明的是,为了清楚地说明本发明的内容,本发明特举多个实施例以进一步阐释本发明的不同实现方式,其中,该多个实施例是列举式并非穷举式。此外,为了说明的简洁,前实施例中已提及的内容往往在后实施例中予以省略,因此,后实施例中未提及的内容可相应参考前实施例。 It should be noted that the various embodiments of the present invention are further described to illustrate the various embodiments of the present invention in order to clearly illustrate the present invention. Further, for the sake of brevity of explanation, the contents already mentioned in the foregoing embodiment are often omitted in the latter embodiment, and therefore, contents not mentioned in the latter embodiment can be referred to the previous embodiment accordingly.
虽然该发明可以以多种形式的修改和替换来扩展,说明书中也列出了一些具体的实施图例并进行详细阐述。应当理解的是,发明者的出发点不是将该发明限于所阐述的特定实施例,正相反,发明者的出发点在于保护所有基于由本权利声明定义的精神或范围内进行的改进、等效转换和修改。同样的元器件号码可能被用于所有附图以代表相同的或类似的部分。 Although the invention may be modified in various forms of modifications and substitutions, some specific embodiments of the invention are set forth in the specification and detailed. It should be understood that the inventor's point of departure is not to limit the invention to the particular embodiments set forth, but the inventor's point of departure is to protect all improvements, equivalent transformations and modifications based on the spirit or scope defined by the claims. . The same component numbers may be used in all figures to represent the same or similar parts.
此外,在本说明书中以包含处理器核的缓存***为例进行说明,但本发明技术方案也可以被应用于包含任何合适的处理器( Processor )的缓存***。例如,所述处理器可以是通用处理器( General Processor )中央处理器( CPU )、微控制器( MCU )、数字信号处理器( DSP )、图象处理器( GPU )、片上*** (SOC) 、专用集成电路( ASIC )等。 In addition, in the present specification, a cache system including a processor core is taken as an example, but the technical solution of the present invention can also be applied to include any suitable processor (Processor). ) The cache system. For example, the processor may be a general purpose processor (CPU), a microcontroller (MCU), a digital signal processor ( DSP), image processor (GPU), system on chip (SOC), application specific integrated circuit (ASIC), etc.
图 1 是本发明所述的二级缓存采用多路组形式构成的指令预取结构图 100 。如图 1 所示,结构图 100 包含一个主动表( Active list ) 104 、一个扫描器 108 、一个轨道表( Track table ) 110 、一个循迹器( Tracker ) 114 、一个二级指令缓存( L2 Cache ) 106 、一个一级指令缓存( L1 Cache ) 112 和一个处理器核 116 ( CPU Core )。应当理解的是,这里列出各种部件的目的是为了便于描述,还可以包含其他部件,而某些部件可以被省略。这里的各种部件可以分布在多个***中,可以是物理上存在的或是虚拟的,可以是硬件实现(如:集成电路)、软件实现或由硬件和软件组合实现。 1 is an instruction prefetching structure diagram 100 in which a level 2 cache of the present invention is constructed in a multi-path group format. As shown in Figure 1, the structure 100 includes an active list 104, a scanner 108, a track table 110, and a tracker ( Tracker 114, a Level 2 Instruction Cache (L2 Cache) 106, a Level 1 Instruction Cache (L1 Cache) 112, and a Processor Core 116 ( CPU Core ). It should be understood that the various components listed herein are for ease of description and may include other components, and some components may be omitted. The various components herein may be distributed across multiple systems, either physically or virtually, and may be hardware implemented (eg, integrated circuits), implemented in software, or implemented in a combination of hardware and software.
本发明所述的指令地址( Instruction Address )指的是指令在主存储器中的存储地址,即可以根据该地址在主存储器中找到这条指令。在此为简单明了起见,均假设虚拟地址等于物理地址,对于需要进行地址映射的情况,本发明所述方法也可适用。 Instruction Address (Instruction Address) ) refers to the memory address of the instruction in the main memory, that is, the instruction can be found in the main memory according to the address. For the sake of simplicity and clarity, it is assumed that the virtual address is equal to the physical address, and the method of the present invention is also applicable for the case where address mapping is required.
在本发明中,分支指令( Branch Instrutrion )或分支源( Branch Source )指的是任何适当的能导致处理器核 116 改变执行流( Execution Flow )(如:非按顺序执行一条指令)的指令形式。分支源地址( Branch Souce Address )可以是分支指令本身的指令地址;分支目标( Branch Target )指的是分支指令造成的分支转移所转向的目标指令,分支目标地址( Branch Target Address )可以指当分支指令的分支转移成功发生时转移进入的地址,也就是分支目标指令的指令地址;当前指令可以指当前正在被处理器核执行或获取的指令;当前指令块可以指含有当前正被处理器执行的指令的指令块。 In the present invention, a branch instruction (Branch Instrutrion) or a branch source (Branch Source) ) refers to any form of instruction that causes the processor core 116 to change the Execution Flow (eg, execute an instruction out of order). Branch source address ( Branch Souce Address ) can be the instruction address of the branch instruction itself; branch target ( Branch Target ) refers to the target instruction that the branch transfer caused by the branch instruction is redirected, branch target address ( Branch Target Address It can refer to the address that is transferred when the branch transfer of the branch instruction succeeds, that is, the instruction address of the branch target instruction; the current instruction can refer to the instruction currently being executed or acquired by the processor core; the current instruction block can refer to the current positive The instruction block of the instruction executed by the processor.
在本实施例中,一级指令缓存 112 采用全相联的形式构成,一级指令缓存 112 的每一存储行称为一级指令块,一级指令缓存 112 中存储了至少一个包含当前指令在内的一段连续指令的一级指令块。一级指令缓存 112 中包含复数个一级指令块,每个一级指令块包含复数条指令,存储在一级指令缓存 112 中的每个一级指令块都有一个一级块号( BNX1 ),一级块号 BNX1 就是一级指令块所在一级指令缓存 112 中的行号。二级指令缓存 106 由两个相同的存储器 126 和 128 构成,每个存储器构成一个路组,每个路组行数相同,即采用二路组形式构成。存储器 126 和 128 的每一存储行称为二级指令块,每个二级指令块有一个二级块号( BNX2 ),它由二级指令块所在二级指令缓存中的行号和所在的缓存路组决定的,即指令行地址的索引( index )位加上指示该指令所在的缓存路组位。每个二级指令块包含复数个一级指令块。本发明所述的二级块号指的就是二级指令块在二级指令缓存 106 中的位置。 In the present embodiment, the first level instruction cache 112 is constructed in a fully associative form, and the level one instruction cache 112 Each memory row is referred to as a level one instruction block, and the level one instruction cache 112 stores at least one level one instruction block of a continuous instruction including the current instruction. Level 1 instruction cache 112 The method includes a plurality of first-level instruction blocks, each of the first-level instruction blocks includes a plurality of instructions, and each of the first-level instruction blocks stored in the first-level instruction cache 112 has a first-order block number (BNX1), and the first-order block number BNX1 It is the line number in the level one instruction cache 112 of the level one instruction block. The secondary instruction cache 106 consists of two identical memories 126 and 128 In the composition, each memory constitutes a road group, and each road group has the same number of rows, that is, a two-way group form. Each storage line of memories 126 and 128 is referred to as a secondary instruction block, and each secondary instruction block has a secondary block number ( BNX2), which is determined by the row number in the secondary instruction cache of the secondary instruction block and the cache path group in which it is located, that is, the index of the instruction row address (index The bit plus the cache path group bit indicating the instruction. Each level two instruction block contains a plurality of level one instruction blocks. The secondary block number of the present invention refers to the location of the secondary command block in the secondary instruction cache 106.
二级指令缓存 106 和一级指令缓存 112 可以包含任何合适的存储设备,如:寄存器( register )或寄存器堆( register file )、静态存储器( SRAM )、动态存储器( DRAM )、闪存存储器( flash memory )、硬盘、固态磁盘( Solid State Disk )以及任何一种合适的存储器件或未来的新形态存储器。二级指令缓存 106 可以作为***的一个缓存工作,或当有其他缓存存在时作为一级缓存工作;且可以被分割成复数个被称为存储块( Memory Block )的存储片段的用于存储处理器核 116 要访问的数据,如在指令块( Instruction Block )中的指令。 The secondary instruction cache 106 and the primary instruction cache 112 may comprise any suitable storage device, such as: a register ( Register ) or register file, static memory (SRAM), dynamic memory (DRAM), flash memory (flash Memory ), hard disk, solid state disk and any suitable storage device or future new form of memory. Secondary instruction cache 106 Can work as a cache for the system, or as a level 1 cache when other caches exist; and can be partitioned into a plurality of memory segments called memory blocks for storing processor cores 116 Data to access, such as instructions in an Instruction Block.
主动表 104 中包含两个标签阵列 118 和 120 以及两个存储一级块号 BNX1 的存储阵列 122 和 124 。由于二级指令缓存 106 采用二路组形式构成,因此主动表也采用二路组形式构成。主动表 104 中的一个标签阵列及存储阵列与二级指令缓存 106 的一个路组对应,即标签阵列 118 、存储阵列 122 和二级缓存路组 126 对应,标签阵列 120 、存储阵列 124 和二级缓存路组 128 对应。构成存储阵列 122 和 124 的元素称为表项,每个表项用于存储一级块号 BNX1 和有效位( Valid bit ),以保存一级指令块在一级指令缓存和二级指令缓存中的关系。由于每个二级指令块包含复数个一级指令块,因此主动表 104 中的存储阵列 122 和 124 的每一行包含复数个表项,表项中存有在二级指令块中的一级指令块在一级指令缓存 112 中所处的行号 BNX1 。 The active table 104 contains two tag arrays 118 and 120 and two storage arrays that store the primary block number BNX1 122 and 124. Since the secondary instruction cache 106 is formed in the form of a two-way group, the active table is also constructed in the form of a two-way group. A tag array and storage array in active table 104 and a secondary instruction cache Corresponding to one of the path groups 106, that is, the tag array 118, the storage array 122, and the L2 cache group 126, the tag array 120, the storage array 124, and the L2 cache group 128 Correspondence. The elements that make up storage arrays 122 and 124 are called entries, and each entry is used to store the primary block number BNX1 and the valid bit ( Valid bit ) to save the relationship between the level one instruction block and the level two instruction cache. Since each secondary instruction block contains a plurality of primary instruction blocks, storage arrays 122 and 124 in active table 104 Each row contains a plurality of entries in which the row number BNX1 of the primary instruction block in the secondary instruction block 112 in the secondary instruction cache 112 is stored.
扫描器 108 审查从二级指令缓存 106 填进一级指令缓存 112 中的一级指令块,获取指令类型信息,判断指令为分支指令还是非分支指令。若判断得到所述指令为分支指令,则计算分支指令的目标地址。计算方法包括通过一加法器将当前指令地址加上分支转移距离,得到分支指令的目标地址。然后将计算得到的分支指令的目标地址送到主动表 104 中进行匹配。 The scanner 108 examines the level 1 instruction cache from the level 2 instruction cache 106. The first level instruction block acquires instruction type information and determines whether the instruction is a branch instruction or a non-branch instruction. If it is determined that the instruction is a branch instruction, the target address of the branch instruction is calculated. The calculation method includes adding a branch transfer distance to the current instruction address by an adder to obtain a target address of the branch instruction. Then, the calculated target address of the branch instruction is sent to the active table. Match in 104.
在本实施例中,轨道表 110 的每行与一级指令缓存 112 的每行一一对应,且都由同一行指针指向。轨道表 110 的每一行包含复数个轨迹点( Track Point ),每个轨迹点对应一级指令缓存 112 一行中的一条指令,即轨道表中每行的轨迹点个数与一级指令缓存中每行的指令条数一致。一个轨迹点是轨道表中的一个表项,可含有至少一条指令的信息,比如指令类别信息、分支目标地址等。在本发明中轨迹点本身的轨迹表地址与该轨迹点所代表指令的指令地址相关( Correspond );而分支指令轨迹点中含有分支目标的地址,且该地址与分支目标指令地址相关。与一级指令缓存 112 中一系列连续指令所构成的一个指令块相对应的复数个连续的轨迹点称为一条轨道。该指令块与相应的轨道由同一个一级块号( BNX1 )指示。一条轨道中的总的轨迹点数可以等于轨道表 110 中一行中的表项总数。轨道表 110 也可以有其它的组织形式。 In the present embodiment, each row of the track table 110 and the level one instruction cache 112 Each row corresponds to each other and is pointed by the same row pointer. Each row of the track table 110 includes a plurality of track points, each of which corresponds to a level one instruction cache 112. One instruction in a row, that is, the number of track points per line in the track table is consistent with the number of instructions per line in the level one instruction cache. A track point is an entry in the track table, which may contain information of at least one instruction, such as instruction class information, branch target address, and the like. In the present invention, the track table address of the track point itself is related to the command address of the instruction represented by the track point ( Correspond); and the branch instruction track point contains the address of the branch target, and the address is related to the branch target instruction address. With the first level instruction cache 112 A plurality of consecutive track points corresponding to one block of instructions formed by a series of consecutive instructions are referred to as one track. The command block is associated with the corresponding track by the same first block number (BNX1 ) instructions. The total number of track points in a track can be equal to the total number of entries in a row in track table 110. The track table 110 can also have other organizational forms.
当处理器核 116 根据需求从一级指令缓存 112 中取指令时,假设此时该指令没有存储在一级指令缓存 112 和二级指令缓存 106 中,那么根据指令地址( PC ),将指令从低层次存储器中填充到二级指令缓存 106 中由替换算法(如 LRU )确定的二级块号 BNX2 指向的二级指令块中;再根据处理器核 116 的需求,将二级缓存 106 中的相应一级指令块填充到一级指令缓存 112 中由替换算法(如 LRU )确定的 BNX1 指向的存储行中。替换算法也可以采用先进先出算法( FIFO )、最近最少使用算法( LRU )、随机替换算法( Random )等现有算法。在此过程中,扫描器 108 审查该一级指令块中的指令类型,提取出其中分支指令的分支信息,并计算分支指令目标地址。计算方法包括通过一加法器将当前指令地址加上分支转移距离,得到分支指令的目标地址。在这里,术语'填充( Fill )'表示将指令从较低层次的存储器移动到较高层次的存储器中。 When the processor core 116 fetches instructions from the level one instruction cache 112 as needed, it is assumed that the instruction is not stored in the level one instruction cache at this time. In the 112 and the second level instruction cache 106, the instruction is then padded from the low level memory to the second level block number determined by the replacement algorithm (e.g., LRU) in the second level instruction cache 106 according to the instruction address (PC). In the secondary instruction block pointed to by BNX2; according to the requirements of the processor core 116, the corresponding level one instruction block in the second level cache 106 is filled into the level one instruction cache 112 by a replacement algorithm (such as LRU). ) Determine the storage line that BNX1 points to. The replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm ( Random ) and other existing algorithms. During this process, the scanner 108 Examine the instruction type in the first-level instruction block, extract the branch information of the branch instruction, and calculate the branch instruction target address. The calculation method includes adding a branch transfer distance to the current instruction address by an adder to obtain a target address of the branch instruction. Here, the term 'filling' Fill )' means moving instructions from lower level memory to higher level memory.
可以通过将扫描器 108 审查、计算得到的分支目标指令地址与主动表 104 中存储的指令行地址匹配确定该分支目标指令是否已经存储在二级指令存储器 106 中。首先用计算得到的分支目标指令地址的索引位读出存储在主动表中的两个标签,然后将这两个标签与计算出的目标分支指令地址的标签位进行比较,如果其中一个匹配成功,用计算出的分支目标地址的块偏移量选出匹配成功的路组中该指令所对应的表项,如果存储在该表项中的一级块号( BNX1 )有效,表明该目标分支指令已经存储在一级指令缓存 112 中,那么将存储在主动表中的该一级块号 BNX1 和计算出的分支目标地址的偏移量( Offset )一起写进轨道表,写入位置为分支源地址所对应的轨道表的轨迹点中;如果存储在该表项中的一级块号( BNX1 )无效,表明该目标分支指令没有存储在一级指令缓存 112 中,而是仅存储在二级指令缓存 106 中,那么将该指令对应的二级块号 BNX2 和计算出的分支目标地址的块偏移量以及分支目标地址偏移量一起写入轨道表中,写入位置为分支源地址所对应的轨道表的轨迹点中;如果两个标签匹配都不成功,表明分支目标信息所在的指令行尚未被填充到二级指令存储器 106 中,那么根据计算出的分支目标指令地址将指令从低层次存储器中填充到二级指令缓存 106 中由替换算法(如 LRU )确定的二级块号 BNX2 指向的二级指令块中,并将该二级块号 BNX2 和计算出的分支目标地址的块偏移量以及分支目标地址偏移量一起写入轨道表中,写入位置为分支源地址所对应的轨道表的轨迹点中。本发明所述的匹配( Match ),指的是对两个值进行比较,当两者相同或相等时为'匹配成功( Match )',否则为'匹配不成功( Not Match ) ' 。 The branch target instruction address and the active table that can be reviewed and calculated by the scanner 108 The instruction row address match stored in the middle determines whether the branch target instruction has been stored in the secondary instruction memory 106. Medium. First, the index points of the branch target instruction address are used to read the two tags stored in the active table, and then the two tags are compared with the calculated tag bits of the target branch instruction address. If one of the matches is successful, Using the calculated block offset of the branch target address, the entry corresponding to the instruction in the successfully matched road group is selected, if the primary block number stored in the entry is BNX1) is valid, indicating that the target branch instruction has been stored in the level one instruction cache 112, then the offset of the first level block number BNX1 and the calculated branch target address stored in the active table ( Offset) is written into the track table together, and the write position is in the track point of the track table corresponding to the branch source address; if the first block number stored in the entry (BNX1) Invalid, indicating that the target branch instruction is not stored in the level one instruction cache 112, but only in the level two instruction cache 106, then the second level block number corresponding to the instruction is BNX2 And the calculated block offset of the branch target address and the branch target address offset are written into the track table together, and the write position is in the track point of the track table corresponding to the branch source address; if neither tag matches Successful, indicating that the instruction line where the branch target information is located has not been filled into the secondary instruction memory 106, then the instruction is filled from the low-level memory to the second-level block number determined by the replacement algorithm (such as LRU) according to the calculated branch target instruction address BNX2 Point to the secondary instruction block and place the secondary block number BNX2 The block offset and the branch target address offset are calculated in the track table together with the calculated branch target address, and the write position is in the track point of the track table corresponding to the branch source address. Matching according to the invention (match ), refers to the comparison of two values, when the two are the same or equal, it is 'match match', otherwise it is 'match unsuccessful' (Not Match).
可以用第一地址和第二地址来表示轨迹点(指令)在轨道表中的位置信息;其中第一地址表示该轨迹点对应指令的块号(指向轨道表中一条轨道及一级指令缓存中相应的一个一级指令块),第二地址表示该轨迹点(即对应指令)在该轨道(存储块)中的相对位置(偏移量, Address Offset )。一组第一地址及第二地址对应轨道表中的一个轨迹点,即可以根据一组第一地址及第二地址从轨道表中找到对应的轨迹点。如果该轨迹点的类型代表一条分支指令,可以根据轨道表中该表项存储的内容中含的第一地址确定分支目标的轨道,并根据第二地址确定目标轨道的一个特定的轨迹点。这样,轨道表就成为一个以轨道表项地址对应分支源地址、表项内容对应分支目标地址来代表一条分支指令的表。 The first address and the second address may be used to represent position information of the track point (instruction) in the track table; wherein the first address indicates the block number of the track point corresponding to the track point (pointing to a track in the track table and the level one instruction cache) Corresponding one level one instruction block), the second address indicates the relative position of the track point (ie corresponding instruction) in the track (storage block) (offset, Address Offset ). A set of first address and second address corresponds to a track point in the track table, that is, a corresponding track point can be found from the track table according to a set of first address and second address. If the type of the track point represents a branch instruction, the track of the branch target may be determined according to the first address included in the content stored in the entry in the track table, and a specific track point of the target track is determined according to the second address. Thus, the track table becomes a table representing a branch instruction with the branch source address corresponding to the track entry address and the branch target address corresponding to the entry of the entry.
为了在轨道表 110 的一条轨道中建立与顺序执行下一条轨道的联系,在每条轨道代表最后一条指令的轨迹点后再设一个结束轨迹点,其中存放顺序执行下一条轨道(指令块)的第一地址。如果一级指令缓存 112 中可以存储多个指令块,在当前指令块被执行时,把顺序执行下一个指令块也取到指令读缓冲中以备处理器核 116 读取执行。下一指令块的指令地址可以用当前指令块的指令地址加上一个指令块的地址长度来求得。该地址如前述被送到主动表 104 匹配,获得的指令块被填进由替换算法所指示的一级指令缓存 112 的指令块中。新存进一级指令缓存 112 的下一指令块中的指令也被扫描器 108 扫描,提取信息填充由该一级块号 BNX1 指出的轨道如前所述。替换算法也可以采用先进先出算法( FIFO )、最近最少使用算法( LRU )、随机替换算法( Random )等现有算法。 For the track table 110 The relationship between the next track and the next track is established in a track, and an end track point is set after each track represents the track point of the last instruction, wherein the first address of the next track (instruction block) is executed in the order of storage. If the first level instruction cache A plurality of instruction blocks can be stored in 112. When the current instruction block is executed, the next execution of the instruction block is also taken into the instruction read buffer for the processor core 116. Read execution. The instruction address of the next instruction block can be found by the instruction address of the current instruction block plus the address length of an instruction block. The address is sent to the active list 104 as described above. Matching, the obtained instruction block is filled in the instruction block of the level one instruction cache 112 indicated by the replacement algorithm. The instructions in the next instruction block newly stored in the level 1 instruction cache 112 are also scanned by the scanner 108. Scanning, extracting information fills the track indicated by the first block number BNX1 as previously described. The replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), a random replacement algorithm ( Random algorithm and other existing algorithms.
循迹器 114 主要由选择器 130 、寄存器 132 和增量器 134 构成。循迹器 114 的读指针指向轨道表 110 中当前指令所在轨道中位于当前指令之后的第一个分支指令轨迹点;或在该轨道上当前指令后没有分支轨迹点的情况下指向该轨道的结束轨迹点。循迹器 114 的读指针由第一地址指针和第二地址指针组成,其中第一地址指针的值为当前指令所在一级指令块的一级块号,即行指针;第二地址指针指向该轨道上当前指令后的第一个分支指令轨迹点或结束轨迹点。 The tracker 114 is mainly composed of a selector 130, a register 132, and an incrementer 134. Tracker 114 The read pointer points to the track table 110 The first branch instruction track point in the track in which the current instruction is located after the current instruction; or the end track point pointing to the track without the branch track point after the current instruction on the track. Tracker 114 The read pointer is composed of a first address pointer and a second address pointer, wherein the value of the first address pointer is a first-order block number of the first-level instruction block where the current instruction is located, that is, a row pointer; and the second address pointer points to the current instruction on the track. The first branch commands the track point or the end track point.
当处理器核 116 根据需求从一级指令缓存 112 中取指令时,由循迹器 114 提供一级块号 BNX1 用于对一级指令块寻址,由处理器提供偏移量取出相应的指令,并向循迹器 114 提供 BRANCH 信号和 TAKEN 信号。 BRANCH 信号指明该指令是否是分支指令, TAKEN 信号用于控制选择器的输出。循迹器 114 用于指明当前指令后的第一个分支指令,或在该轨道上当前指令后没有分支轨迹点的情况下指向该轨道的结束轨迹点,并给处理器核 116 提供当前指令的一级块号 BNX1 。 The first block number is provided by the tracker 114 when the processor core 116 fetches instructions from the level one instruction cache 112 as needed. BNX1 is used to address the Level 1 instruction block, the processor provides the offset to fetch the corresponding instruction, and provides the BRANCH signal and the TAKEN signal to the tracker 114. BRANCH The signal indicates whether the instruction is a branch instruction and the TAKEN signal is used to control the output of the selector. Tracker 114 Used to indicate the first branch instruction after the current instruction, or to point to the end track point of the track if there is no branch track point after the current instruction on the track, and provide the processor core 116 with the first block number of the current instruction. BNX1 .
当循迹器 114 的读指针指向的轨迹点存储的内容包含一个一级块号 BNX1 ,那么表明对应的指令已经存储在一级指令缓存 112 中,等执行到该条指令时,处理器核 116 直接从一级指令缓存 112 中取出指令。当读指针指向的轨迹点存储的内容包含一个二级块号 BNX2 ,那么以二级块号 BNX2 作为主动表地址查找主动表。如果存储在对应于该二级块号的表项中的一级块号 BNX1 已经有效,那么表明在执行到该指令前已经有其他分支指令的目标地址是与该二级块号对应的指令地址相同,且已经将该目标指令取进一级指令缓存 112 中,因此将该一级块号 BNX1 写入该轨迹点中,等执行到该条指令时,处理器核 116 直接到一级指令缓存 112 中取出指令;如果存储在对应于该二级块号的表项中的一级块号 BNX1 是无效的,表明该目标指令不在一级指令缓存 112 中,那么根据替换策略,确定一级块号 BNX1 ,将目标指令行从二级指令缓存 106 中取出,填充进一级指令缓存 112 对应的一级指令块中,并将该一级块号 BNX1 写入主动表 104 的存储阵列 122 和 124 对应的表项中,等执行到该条指令时,处理器核 116 直接到一级指令缓存 112 中取出指令。 When the track point pointed to by the read pointer of the tracker 114 stores the content containing a first block number BNX1 , indicating that the corresponding instruction has been stored in the first level instruction cache 112, and when executing the instruction, the processor core 116 directly from the level one instruction cache 112. Remove the instruction. When the content of the track point pointed to by the read pointer contains a secondary block number BNX2, then the secondary block number BNX2 Find the active table as the active table address. If the primary block number stored in the entry corresponding to the secondary block number is BNX1 If it is already valid, it indicates that the target address of another branch instruction before executing the instruction is the same as the instruction address corresponding to the second block number, and the target instruction has been taken into the first level instruction cache 112. Therefore, the first block number BNX1 is written into the track point, and when the instruction is executed, the processor core 116 goes directly to the first level instruction cache 112. The instruction is fetched; if the primary block number BNX1 stored in the entry corresponding to the secondary block number is invalid, indicating that the target instruction is not in the primary instruction cache 112, then the primary block number is determined according to the replacement policy. BNX1, the target instruction line is taken out from the second level instruction cache 106, filled into the first level instruction block corresponding to the level one instruction cache 112, and the first level block number BNX1 is written into the active table 104. In the corresponding entry of the storage arrays 122 and 124, when the instruction is executed, the processor core 116 directly fetches the instruction into the first-level instruction cache 112.
若循迹器 114 指向的分支指令没有发生分支转移,则循迹器 114 的读指针指向该分支指令后的第一个分支指令轨迹点,或在该分支指令后的轨迹点中没有分支指令轨迹点的情况下指向该轨道的结束轨迹点。处理器核读取分支指令后的顺序指令执行。 If the branch instruction pointed to by the tracker 114 does not have a branch transfer, the tracker 114 The read pointer points to the first branch instruction track point after the branch instruction, or points to the end track point of the track if there is no branch instruction track point in the track point after the branch instruction. The processor core reads the sequential instruction execution after the branch instruction.
若循迹器 114 指向的分支指令成功发生分支转移,则将上述从指令存储器 106 读出的分支目标指令块存储到指令读缓冲 112 中缓冲替换逻辑指定的指令块中,并在轨道表 110 的相应轨道中填入扫描器 108 已产生的新轨道信息。此时上述分支目标第一地址和第二地址成为新的循迹器地址指针,指向轨道表中的分支目标对应的轨迹点。该新循迹器地址指针也指向新填充的分支指令块,使其成为新的当前指令块。处理器核从新的当前指令块中用指令地址( PC )的偏移量位选出需要的指令。此后循迹器 114 移动读指针指向新的当前指令块对应轨道中该分支目标指令之后的第一个分支指令轨迹点,或在该分支目标指令后的轨迹点中没有分支指令轨迹点的情况下指向该轨道的结束轨迹点。 If the branch instruction pointed to by the tracker 114 successfully branches, the above slave instruction memory 106 is used. The read branch target instruction block is stored in the instruction block specified by the buffer replacement logic in the instruction read buffer 112, and the scanner is filled in the corresponding track of the track table 110. New track information that has been generated. At this time, the branch target first address and the second address become new tracker address pointers, and point to track points corresponding to the branch targets in the track table. The new tracker address pointer also points to the newly filled branch instruction block, making it the new current instruction block. The processor core uses the instruction address from the new current instruction block ( The offset bit of PC) selects the required instruction. Tracer 114 thereafter The mobile read pointer points to the first branch instruction track point after the branch target instruction in the corresponding track of the new current instruction block, or points to the end of the track if there is no branch instruction track point in the track point after the branch target instruction Track point.
若循迹器 114 指向的是轨道中的结束轨迹点,循迹器 114 的读指针更新为该结束轨迹点中的位置内容值,即指向下一轨道的第一个轨迹点,从而指向新的当前指令块。之后循迹器 114 移动读指针指向新的当前指令块对应轨道中的第一个分支指令轨迹点,或在该轨道没有分支指令轨迹点的情况下指向该轨道的结束轨迹点。依次重复上述过程,即可在处理器核 116 执行指令前将该指令填充到指令读缓冲 112 中,使得处理器核 116 在对该指令进行取指时不需要等待,从而提高处理器性能。 If the tracker 114 points to the end track point in the track, the tracker 114 The read pointer is updated to the position content value in the end track point, that is, the first track point pointing to the next track, thereby pointing to the new current instruction block. After the tracker 114 The move read pointer points to the first branch instruction track point in the corresponding track of the new current block, or to the end track point of the track if the track has no branch command track point. Repeat the above process in turn, in the processor core 116 The instruction is populated into the instruction read buffer 112 before the instruction is executed, so that the processor core 116 does not need to wait while fetching the instruction, thereby improving processor performance.
图 2 是本发明所述循迹器读指针移动的实施例 200 。在本实施例中,循迹器读指针越过轨道表中的非分支指令,移动到轨道表中下一分支点并等待处理器核 116 分支判断结果。为便于说明,图 2 中省略了与本实施例说明内容无关的部分部件。在本实施例中,假设轨道表 110 中存储的指令类型及其存储的指令信息均按指令地址从小到大从左到右排列,即当按顺序执行这些指令时,各个指令信息及相应指令类型的访问顺序为从左向右。另假设轨道表 110 中指令类型为' 0 '代表轨道表 110 中相对应的指令为非分支指令,指令类型为' 1 '代表相对应的指令为分支指令。任一时刻可以读出轨道表 110 中由第一地址 214 (一级块号 BNX1 )指出的一条轨道中由第二地址 216 (偏移量, BNY) 指出的代表指令类型的表项。任一时刻也可以读出轨道表 110 中由第一地址 214 指出的一条轨道中代表指令类型的复数个表项甚至所有表项。 2 is an embodiment of the tracker read pointer movement of the present invention. . In this embodiment, the tracker read pointer moves over the non-branch instruction in the track table, moves to the next branch point in the track table, and waits for the processor core 116 branch to determine the result. For convenience of explanation, Figure 2 Some components that are not related to the description of the present embodiment are omitted. In the present embodiment, it is assumed that the track table 110 The instruction types stored in the instruction and the instruction information stored therein are arranged from small to large from left to right according to the instruction address, that is, when the instructions are executed in order, the access order of each instruction information and the corresponding instruction type is from left to right. Another assumption of the track table The instruction type of '0' in 110 means that the corresponding instruction in the track table 110 is a non-branch instruction, and the instruction type '1' indicates that the corresponding instruction is a branch instruction. The track table can be read at any one time. 110 The entry representing the instruction type indicated by the second address 216 (offset, BNY) in a track indicated by the first address 214 (primary block number BNX1). The track table can also be read at any time. A plurality of entries representing all types of instructions, or even all entries, in a track indicated by the first address 214 in 110.
在轨道表 110 中每一行中指令地址最大的一条指令的表项的右方再增设一个结束表项以存放顺序执行下一条指令的地址。结束表项的指令类型总是被设定为' 1 '。结束表项中指令信息的第一地址是下一条指令的指令块号,第二地址( BNY )恒定为零,指向该指令轨道的第一项。结束表项被定义为等同于一个无条件分支指令。当循迹器指向一个结束表项时总会产生一个内部控制信号使选择器 208 选择轨道表 110 的输出 230 ;也会产生一个内部控制信号使寄存器 210 更新。该内部信号可以由轨道表 110 中结束表项中含有的特殊位触发;也可以由第二地址 216 指向结束表项来触发。 On the track table 110 An end table entry is added to the right of the entry of the instruction with the largest instruction address in each row to store the address of the next instruction in sequence. The instruction type of the end entry is always set to ' 1 '. The first address of the instruction information in the end entry is the instruction block number of the next instruction, and the second address (BNY) ) Constant to zero, pointing to the first item of the instruction track. The end table entry is defined to be equivalent to an unconditional branch instruction. When the tracker points to an end entry, an internal control signal is always generated to cause the selector 208 to select the track table. Output 230 of 110; an internal control signal is also generated to cause register 210 to be updated. The internal signal can be triggered by a special bit contained in the end entry in the track table 110; or by the second address 216 points to the end table item to trigger.
在图 2 中,循迹器 114 中主要包括了移位器 202 、前导零记数器 204 、加法器 206 、选择器 208 和寄存器 210 。其中移位器 202 将从轨道表 110 读出的代表复数条指令的复数个指令类型 218 向左移位,其移动位数由寄存器 210 输出的第二地址指针 216 决定。移位器 202 输出的移位后指令类型 224 的最左边一位是步进位( STEP Bit )。该步进位的信号与从处理器核来的 BRANCH 信号共同决定寄存器 210 的更新。选择器 208 由控制信号 TAKEN 控制,其输出 232 为下一地址( Next Address ),其中含有第一地址部分及第二地址部分。当 TAKEN 为' 1 '(分支成功)时,选择器 208 选择轨道表 110 的输出 230 (含有分支目标的第一地址及第二地址)做为输出 232 。当 TAKEN 为' 0 '(分支不成功)时,选择器 208 选择现第一地址 214 作为输出 232 第一地址部分,加法器输出 228 做为输出 232 第二地址部分。指令类型 224 被送到前导零计数器 204 以计算下一个' 1 '指令类型(代表相应指令为分支指令)前有多少个' 0 '指令类型(代表相应指令为非分支指令),其中不管步进位是' 0 '或' 1 '都被计算为一位' 0 '。得出的前导' 0 '的数目 226 (步进数 STEP Number )则被送到加法器 206 与寄存器 210 输出的第二地址 216 相加以得出下一分支源地址( Next Branch Address ) 228 。此下一分支源地址就是当前指令下一条分支指令的第二地址,而在此之前的非分支指令则会被循迹器 114 跳( Skip )过。 In FIG. 2, the tracker 114 mainly includes a shifter 202, a leading zero register 204, and an adder 206. , selector 208 and register 210. The shifter 202 shifts the plurality of instruction types 218 representing the plurality of instructions read from the track table 110 to the left, and the number of bits of movement is changed by the register. The second address pointer of the 210 output is determined by 216. The leftmost bit of the shifted instruction type 224 output by the shifter 202 is the step bit (STEP Bit) ). The signal of the step bit and the BRANCH signal from the processor core together determine the update of the register 210. The selector 208 is controlled by the control signal TAKEN, and its output 232 It is the Next Address, which contains the first address part and the second address part. When TAKEN is '1' (branch is successful), selector 208 selects track table 110 The output 230 (containing the first address and the second address of the branch target) is used as the output 232. When TAKEN is '0' (the branch is unsuccessful), the selector 208 selects the current first address 214. As the first address portion of the output 232, the adder output 228 acts as the output 232 second address portion. Instruction type 224 is sent to leading zero counter 204 to calculate the next '1' 'The type of instruction (representing the corresponding instruction is a branch instruction) is the number of '0' instruction types (representing the corresponding instruction is a non-branch instruction), which is calculated as a bit regardless of whether the step bit is '0' or '1' 0 '. The resulting number of leading '0's 226 (step number STEP Number) is sent to the second address of the output of the adder 206 and the register 210. Add to the next branch address ( Next Branch Address ) 228 . The next branch source address is the second address of the next branch instruction of the current instruction, and the previous non-branch instruction is skipped by the tracker 114 (Skip).
当第二地址 216 指向代表一条指令的表项时,受第二地址控制的移位器也将轨道表 110 输出的复数条指令类型统一向左移位。此时代表轨道表 110 所读出指令的指令类型被移位到指令类型 224 中的最左面的步进位。移位指令类型 224 被送入前导零计数器计算下一条分支指令之前指令的条数。此时前导零计数器 204 的输出 226 即是循迹器应该前进的步长。此步长与第二地址 216 由加法器 206 相加后即得出下一分支指令地址 228 。 When the second address 216 points to an entry representing an instruction, the shifter controlled by the second address also places the track table 110. The output of the multiple instruction types is uniformly shifted to the left. At this time, the instruction type representing the instruction read by the track table 110 is shifted to the leftmost step bit in the instruction type 224. Shift instruction type 224 The leading zero counter is sent to calculate the number of instructions before the next branch instruction. At this point, the output 226 of the leading zero counter 204 is the step size that the tracker should advance. This step and the second address 216 are added by the adder After adding 206, the next branch instruction address 228 is obtained.
当移位后指令类型 224 中的步进位信号为' 0 '时,这表示第二地址 216 所指向的轨道表 110 中的表项为非分支指令,此时该步进位信号控制寄存器 210 更新,而选择器 208 在为' 0 '的 TAKEN 信号 222 控制下选择下一分支源地址 228 成为第二地址 216 ,第一地址 214 保持不变。此时新的第一第二地址指向同一轨道中的下一条分支指令,此分支指令前的非分支指令均被越过。新的第二地址控制移位器 216 将指令类型 218 移位,使代表此分支指令的指令类型位落到 224 的步进位上供下一步操作。 When the step bit signal in the instruction type 224 after shifting is '0', this indicates the track table pointed to by the second address 216 The entry in 110 is a non-branch instruction, at which point the step bit signal control register 210 is updated, and the selector 208 selects the next branch source address under the control of the TAKEN signal 222 of '0'. 228 becomes the second address 216, the first address 214 constant. At this time, the new first and second addresses point to the next branch instruction in the same track, and the non-branch instructions before the branch instruction are crossed. New second address control shifter 216 will be instruction type 218 Shift, so that the instruction type bit representing the branch instruction falls to the step bit of 224 for the next operation.
当移位后指令类型 224 中的步进位信号为' 1 '时,这表示第二地址所指向的轨道表 110 中的表项代表分支指令。此时该步进位信号不影响寄存器 210 更新,寄存器 210 由处理器核来的 BRANCH 信号 234 控制更新。此时加法器输出 228 是当前分支指令同一轨道上的下一条分支指令的地址,同时存储器输出 230 是当前分支指令的目标地址。 When the step bit signal in the instruction type 224 after shifting is '1', this indicates the track table to which the second address points. The entries in the table represent branch instructions. At this time, the step bit signal does not affect the register 210 update, and the register 210 is controlled to be updated by the BRANCH signal 234 from the processor core. Adder output 228 at this time Is the address of the next branch instruction on the same track of the current branch instruction, and the memory output 230 is the target address of the current branch instruction.
当 BRANCH 信号为' 1 '时,选择器 208 的输出 232 更新寄存器 210 。如果此时处理器核来的 TAKEN 信号 222 为' 0 '时,代表处理器核决定在这个分支点选择顺序执行,此时选择器 208 选择下一分支源地址 228 。此时寄存器 210 输出的第一地址 214 不变,下一分支源地址 228 成为新的第二地址 216 。此时新的第一第二地址指向同一轨道中的下一条分支指令。新的第二地址控制移位器 216 将指令类型 218 移位,使代表此分支指令的指令类型位落到 224 的步进位上供下一步操作。 When the BRANCH signal is '1', the output 232 of the selector 208 updates the register 210. . If the TAKEN signal 222 from the processor core is '0' at this time, the processor core decides to select the sequential execution at this branch point, and the selector 208 selects the next branch source address 228. . At this time, the first address 214 of the register 210 is unchanged, and the next branch source address 228 becomes the new second address. . At this point the new first and second addresses point to the next branch instruction in the same track. The new second address control shifter 216 shifts the instruction type 218 so that the instruction type bits representing the branch instruction fall to 224 The stepping position is for the next step.
如果此时处理器核来的 TAKEN 信号 222 为' 1 '时,代表处理器核决定在这个分支点选择程序跳转到分支目标,此时选择器选择从轨道表 110 中读出的分支目标地址 230 成为由寄存器 210 输出的第一地址 214 及未来第二地址 226 。此时 BRANCH 信号 234 控制寄存器 210 锁存上述第一第二地址成为新的第一第二地址。该新的第一第二地址指向可能不在同一轨道上的分支目标地址。新的第二地址控制移位器 216 将指令类型 218 移位,使代表此分支指令的指令类型位落到 224 的步进位上供下一步操作。 If the TAKEN signal 222 from the processor core is '1 'When the processor core decides to jump to the branch target at this branch point, the selector selects the branch target address 230 read from the track table 110 to become the first address output by the register 210. 214 and the second address in the future 226. At this time BRANCH signal 234 control register 210 The first and second addresses are latched to become the new first and second addresses. The new first and second addresses point to branch target addresses that may not be on the same track. New second address control shifter 216 will be instruction type 218 Shift, so that the instruction type bit representing the branch instruction falls to the step bit of 224 for the next operation.
当第二地址指向轨迹表结束表项(下一行表项)时,如前所述内部控制信号控制选择器 208 选择轨道表 110 的输出 230 ,并更新寄存器 210 。此时新的第一地址 214 为轨道表 110 的结束表项中记载的下一轨道的第一地址,第二地址为零。此时第二地址控制控制移位器 216 将指令类型 218 移零位,开始下一步操作。如此周而复始,循迹器 114 与轨道表 110 配合,会跳过轨道表中的非分支指令而总是指向分支指令。 When the second address points to the track table end entry (next row entry), the internal control signal controls the selector 208 to select the track table as described above. The output of 110 is 230 and the register 210 is updated. At this time, the new first address 214 is the track table 110. The first address of the next track recorded in the end entry, the second address is zero. At this time, the second address control shifter 216 shifts the instruction type 218 to the next bit and starts the next operation. So repeating, tracker 114, in conjunction with track table 110, skips non-branch instructions in the track table and always points to branch instructions.
图 3 是本发明所述一级指令块、二级指令块及寻址关系的实施例 300 。在图 3 中,假设指令地址 301 的长度为 40 位,即最高位为第 39 位,最低位为第 0 位,且每一个指令地址对应一个字节( Byte )。因此,指令地址 301 的最低两位 302 (即第 1 、 0 位)对应一个指令字( Instruction Word )中的 4 个字节。假设在本实施例中,指令行 301 的高 8 位是进程标识位( PID ) 310 表示当前执行的是哪一进程。通过进程标识位 310 可以确定当前执行的进程是否存储在指令缓存中,如果不在,可以通过整个行地址 301 进行预取,从而避免指令在指令缓存中的缺失。在指令地址 301 中也可以不包含进程标识位 310 ,那么指令地址的长度就是 32 位。为了便于说明,下面将指令地址的最低两位 302 和最高 8 位 310 去掉,以剩下的 30 位(即第 31 位到第 2 位)构成一个新指令行地址 312 进行说明。 3 is an embodiment 300 of a level one instruction block, a level two instruction block, and an addressing relationship of the present invention. In Figure 3, assume the instruction address The length of 301 is 40 bits, that is, the highest bit is the 39th bit, the lowest bit is the 0th bit, and each instruction address corresponds to one byte (Byte). Therefore, the lowest two bits of the instruction address 301 302 (ie, bits 1 and 0) corresponds to 4 bytes in an instruction word ( Instruction Word ). It is assumed that in the present embodiment, the command line 301 is high 8 The bit is the process identification bit (PID) 310 which indicates which process is currently executing. Pass process identification bit 310 It can be determined whether the currently executing process is stored in the instruction cache, and if not, prefetching can be performed through the entire row address 301, thereby avoiding the absence of the instruction in the instruction cache. At the instruction address 301 The process identifier bit 310 can also be omitted, and the length of the instruction address is 32 bits. For ease of explanation, the lower two bits 302 and the highest eight bits 310 of the instruction address are removed below, with the remaining The 30 bits (i.e., bits 31 to 2) constitute a new command line address 312 for explanation.
在本实施例中,假设一个一级指令块包含了 16 条指令,因此指令行地址 312 中的偏移量( offset ) 303 有 4 位,用此偏移量可以确定一条指令在一级指令块中的位置。该偏移量 303 对应图 1 所述的第二地址( BNY ),因此也可以用此偏移量确定该指令对应轨道表中的哪一个轨迹点。又假设轨道表有 512 行,那么一级块号 BNX1 就有 9 位,其值由所在的行号决定。所以,如果根据处理器 116 的需求将一级指令块从二级指令缓存 106 填充到一级指令指令缓存 112 的过程中,根据前述方法确定出分支指令的分支目标指令已经存储在一级指令缓存 112 中,那么就将存储在主动表 104 中的相应的一级块号加上偏移量 303 一起写入轨道表与分支源指令相对应的轨道表的轨迹点中,等处理器核 116 执行到该分支指令时,可以直接从一级指令缓存 112 中直接读取该指令。 In this embodiment, it is assumed that a level one instruction block contains 16 instructions, so the offset in the instruction line address 312 ( Offset ) 303 has 4 bits, which can be used to determine the position of an instruction in the level one block. The offset 303 corresponds to the second address (BNY) described in FIG. Therefore, it is also possible to use this offset to determine which track point in the track table corresponding to the instruction. Also assume that the track table has 512 rows, then the first block number BNX1 has 9 Bit, whose value is determined by the line number in which it is located. Therefore, if a level one instruction block is filled from the level two instruction cache 106 to the level one instruction instruction cache 112 according to the requirements of the processor 116. In the process, the branch target instruction for determining the branch instruction according to the foregoing method is already stored in the first level instruction cache 112, and then the corresponding first level block number stored in the active table 104 is added with an offset 303. The tracks are written together in the track point of the track table corresponding to the branch source instruction, and when the processor core 116 executes the branch instruction, the instruction can be directly read from the first level instruction cache 112.
在本实施例中,指令行地址 312 中的标签位 311 存储在主动表 104 的一个路组中的标签阵列 118 或 120 中,用于和扫描器 108 产生的目标指令地址进行比较,得到匹配信息。假设在本实施例中,主动表 104 与二级指令缓存块 126 和 128 都有 1024 行,那么对应的指令行地址 312 的索引位 307 就有 10 位(即第 17 位到第 8 位)。索引位 307 用于检索二级指令块位于二级指令缓存的哪一行中,也可以用于将存储在主动表 104 的每个路组中对应的标签阵列 118 和标签阵列 120 中的标签和存储在主动表 104 的每一路组对应表项中的有效值读出。又假设存储在二级指令缓存块 126 或 128 中的一个二级指令块对应 4 个连续的一级指令块,那么块偏移量( Block-offset ) 306 就有两位,即第 6 、 7 位。块偏移量 306 用于选出存储在二级缓存 106 中的二级指令块中的一级指令块,即用于选出对应于主动表中的哪一个表项中的有效值。因此,二级指令块所在二级指令缓存 106 的路组号加上指令行地址 312 的索引位 307 就构成了一个二级块号 BNX2 。如果根据处理器需求将一级指令块从二级指令缓存 106 填充到一级指令指令缓存 112 的过程中,根据前述方法确定出分支指令的分支目标指令没有存储在一级指令缓存 112 中但已存储在二级指令缓存 106 中,那么就将对应的二级块号 BNX2 加上该分支指令的分支目标地址的块偏移量 306 和偏移量 303 一起写入轨道表与分支源指令相对应的轨道表的轨迹点中,等到循迹器指针指向该轨迹点时,将对应的一级指令块从二级指令缓存 106 中填充进一级指令缓存 112 中根据替换策略(如 LRU )确定的一级块号 BNX1 指向的一级缓存块中,等处理器核 116 执行到该分支指令时,可以直接从一级指令缓存 112 中直接读取该指令。 In the present embodiment, the tag bit 311 in the command line address 312 is stored in a tag array in a path group of the active table 104. In 118 or 120, it is used to compare with the target instruction address generated by the scanner 108 to obtain matching information. It is assumed that in the present embodiment, active table 104 and secondary instruction cache blocks 126 and 128 There are 1024 lines, then the corresponding index line 312 of the instruction line address 307 has 10 bits (ie, the 17th to the 8th bits). Index bit 307 It is used to retrieve which row of the secondary instruction cache is located in the secondary instruction cache, and can also be used to store the tags stored in the corresponding tag array 118 and tag array 120 in each path group of the active table 104 in the active table. Each path group of 104 is read out corresponding to the valid value in the entry. Also assume that a secondary instruction block stored in the level two instruction cache block 126 or 128 corresponds to four consecutive level one instruction blocks, then the block offset ( Block-offset) 306 has two bits, the sixth and seventh bits. Block offset 306 is used to select the store in the secondary cache 106 The first-level instruction block in the secondary instruction block in the middle is used to select which of the entries in the active table corresponds to a valid value. Therefore, the path group number of the secondary instruction cache 106 where the secondary instruction block is located plus the instruction line address 312 The index bit 307 constitutes a secondary block number BNX2. If a level one instruction block is filled from the level two instruction cache 106 to the level one instruction instruction cache 112 according to processor requirements In the process, the branch target instruction that determines the branch instruction according to the foregoing method is not stored in the first level instruction cache 112 but is stored in the second level instruction cache 106, then the corresponding second level block number BNX2 The block offset 306 and the offset 303 of the branch target address of the branch instruction are added. Write the track table together with the branch source instruction in the track point of the track table, and wait until the tracker pointer points to the track point, and fill the corresponding level one instruction block from the second level instruction cache 106 into the level one instruction cache. 112 The first-level block number determined by the replacement policy (such as LRU) is in the first-level cache block pointed to by BNX1, and when the processor core 116 executes the branch instruction, it can directly cache from the first-level instruction 112. Read the instruction directly in .
根据本发明方案,可以建立指令在一级指令缓存和二级指令缓存中的映射关系。可以用一级块号 BNX1 加上行地址 312 的偏移量 303 确定指令在存储在一级指令缓存 112 中的一级指令块中的位置;而用指令行地址 312 中的块偏移量 306 就可以确定一级指令块在存储在二级指令缓存 106 中的二级指令块中的位置;指令行地址 312 中的索引位 307 加上二级指令块所在的缓存路组号(即二级块号 BNX2 )就可以确定二级指令块在二级指令缓存 106 中的位置。需要说明的是,虽然一级块号 BNX1 和二级块号 BNX2 没有必然的映射关系,但是一级块号 BNX1 是由一级指令块从二级指令缓存 106 中填充到一级指令缓存 112 中时,由替换算法(如 LRU 算法)确定的,且指明指令在一级指令缓存和二级指令缓存中的位置的第二地址( BNY )都是相同的,即为指令行地址 312 的偏移量 303 。这样,通过前述寻址方式,可以建立指令在一级指令缓存和二级指令缓存中的映射关系。 According to the solution of the present invention, the mapping relationship between the instruction in the level one instruction cache and the level two instruction cache can be established. Can use the first block number BNX1 The offset 303 of the upstream address 312 is added to determine the location of the instruction in the primary instruction block stored in the primary instruction cache 112; and the block offset 306 in the instruction row address 312 is used. It is possible to determine the position of the primary instruction block in the secondary instruction block stored in the secondary instruction cache 106; the index bit 307 in the instruction line address 312 plus the cache path group number in which the secondary instruction block is located (ie, secondary Block number BNX2) can determine the location of the secondary instruction block in the secondary instruction cache 106. It should be noted that although the primary block number BNX1 and the secondary block number BNX2 do not have a necessary mapping relationship, the primary block number BNX1 is replaced by a level one instruction block from the second level instruction cache 106 into the level one instruction cache 112 by a replacement algorithm (such as LRU) The algorithm determines, and indicates that the second address (BNY) of the location of the instruction in the level one instruction cache and the level two instruction cache is the same, that is, the offset of the instruction line address 312. . Thus, by the foregoing addressing mode, the mapping relationship between the instruction in the level one instruction cache and the level two instruction cache can be established.
图 4 是本发明所述二级缓存采用二路组形式构成的具体实施例 400 。根据本发明技术方案,可以将扫描器 108 计算得到的目标指令地址与主动表中存储的指令地址进行匹配,从而得到与该指令地址匹配信息,然后将二级块号 BNX2 或一级块号 BNX1 写入轨道表中,从而生成新的轨道。 4 is a specific embodiment of the second-level cache of the present invention in the form of a two-way group. . According to the technical solution of the present invention, the target instruction address calculated by the scanner 108 can be matched with the instruction address stored in the active table, thereby obtaining matching information with the instruction address, and then the secondary block number BNX2 Or the first block number BNX1 is written to the track table to generate a new track.
为了便于说明,在本实施例中,目标指令地址 312 采用了完整指令地址的一部分进行说明。目标指令地址 312 包括标签位 311 、索引位 307 、块偏移量 306 和偏移量 303 。标签位 311 用于与主动表 104 中的标签 302 和 304 进行比较,得到匹配信息;索引位 307 用于检索该地址对应于主动表中的哪一行;块偏移量 306 用于选出二级指令块中所对应的一级指令块;偏移量 303 用于确定目标指令在一级指令行中的位置,即提供第二地址( BNY )。  For ease of explanation, in the present embodiment, the target instruction address 312 is described using a portion of the complete instruction address. Target instruction address 312 includes tag bit 311, index bit 307, block offset 306, and offset 303. Tag bits 311 are used with tags 302 and 304 in active table 104 For comparison, matching information is obtained; index bit 307 is used to retrieve which row in the active table corresponds to the address; block offset 306 is used to select the corresponding one-level instruction block in the secondary instruction block; offset 303 Used to determine the position of the target instruction in the level one command line, that is, to provide the second address (BNY).
在本实施例中,二级指令缓存 106 由两块存储块 126 和 128 构成,两块存储块包含的行数目一致,即采用二路组形式构成。相应的,主动表也采用二路组形式构成。主动表 104 由第一部分标签阵列 118 和 120 以及第二部分存储块 408 和 410 构成。第一部分标签阵列 118 和 120 用于匹配扫描器 108 计算出的分支目标地址,第二部分用于存储一级块号 BNX1 。由于二级缓存 106 的每个路组中的二级指令块对应 4 个一级指令块,所以主动表 104 中每个路组的一行对应 4 个表项 408 或 410 。轨道表与主动表的行数一致,都是 1024 行。一级指令缓存 112 的每一行包含 16 条指令,即一级指令块包含 16 条指令,因此轨道表 110 中每一行有 16 个表项。  In this embodiment, the secondary instruction cache 106 is composed of two blocks 126 and 128. In the composition, the two blocks contain the same number of rows, that is, in the form of a two-way group. Correspondingly, the active list is also constructed in the form of a two-way set. Active table 104 consists of first part tag arrays 118 and 120 And a second part of the storage blocks 408 and 410. The first part of the tag arrays 118 and 120 are used to match the branch target address calculated by the scanner 108, and the second part is used to store the level 1 block number. BNX1. Since the level two instruction blocks in each path group of the second level cache 106 correspond to four level one instruction blocks, one row of each path group in the active table 104 corresponds to four entries 408 or 410. . The track table has the same number of rows as the active table, which is 1024 lines. Each row of the level one instruction cache 112 contains 16 instructions, that is, the level one instruction block contains 16 instructions, so the track table 110 There are 16 entries in each row.
在本实施例中,假设从二级指令缓存 106 中取出的一级指令块根据 LRU 替换策略填充到一级指令缓存 112 的第 3 行中,该一级指令块包含 3 条分支指令,这 3 条分支指令位于该一级指令块的第 4 条、第 7 条和第 11 条。在本实施例中,假设数值' 1654 '存储在主动表 104 的路组 0 的第 14 行的标签中,数值' 2526 '存储在主动表 104 的路组 1 的第 14 行的标签中。又假设,主动表中路组 0 的第 14 行对应的表项 2 的有效位为' 1 ',表项 3 的有效位为' 0 ',而路组 1 的第 14 行对应的表项 2 的有效位为' 0 '。  In this embodiment, it is assumed that the primary instruction block fetched from the secondary instruction cache 106 is padded to the primary instruction cache according to the LRU replacement policy. In line 3 of 112, the level one instruction block contains three branch instructions located in clauses 4, 7, and 11 of the level one instruction block. In this embodiment, the value is assumed to be ' 1654 'The value stored in the 14th row of the way group 0 of the active list 104, the value '2526' is stored in the 14th of the way group 1 of the active list 104 In the label of the line. It is also assumed that the valid bit of the entry 2 of the row 14 of the active set in the active list is '1', the valid bit of the entry 3 is '0', and the entry corresponding to the 14th row of the way group 1 The valid bit of 2 is ' 0 '.
当扫描器 108 扫描该一级指令块,计算得出第一条分支指令的目标指令地址为 ' 1654|14|2|3 ' ,即对应目标指令地址 312 的标签位 311 的值为 ' 1654 ',索引位 307 的值为' 14 ',块偏移量 306 的值为' 2 ',偏移量 303 的值为' 3 '。首先根据现有技术,用索引位 307 将存于主动表中的第 14 行中的两个有效标签都读出,然后将读出的标签分别送到比较器 420 和比较器 422 中,与扫描器 108 计算出的分支目标指令地址 312 的标签位 311 进行比较,路组' 0 '匹配成功。再用该分支目标地址 312 的块偏移量位 306 选出主动表中的对应表项 2 ,此时有效位为' 1 ',那么就将存储在其中的值' 5 '写入轨道表中的第 3 行第 4 表项,同时将偏移量( BNY )的值' 3 '写入轨道表中的第 3 行第 4 表项中,即将' 5|3 ' 写入轨道表中的第 3 行第 4 表项中 。 When the scanner 108 scans the first-level instruction block, the target instruction address of the first branch instruction is calculated as ' 1654|14|2|3 ' , that is, the value of the tag bit 311 corresponding to the target instruction address 312 is ' 1654 ', the value of the index bit 307 is ' 14 ', the block offset 306 The value of '2' and the value of offset 303 are '3'. First, according to the prior art, the index bit 307 will be stored in the active table. The two valid tags in the row are read, and the read tags are sent to the comparator 420 and the comparator 422, respectively, and the tag bit 311 of the branch target instruction address 312 calculated by the scanner 108. For comparison, the road group '0' matches successfully. Then, using the block offset bit 306 of the branch target address 312, the corresponding entry 2 in the active table is selected, and the valid bit is '1. ', then the value '5' stored in it is written to the third row and fourth entry in the track table, and the value of the offset (BNY) '3' is written to the third row in the track table. 4 In the entry, '5|3' is written to the third row and fourth entry in the track table.
当扫描器 108 计算得出第二条分支指令的目标地址为'1654|14|3|5' ,标签位和索引位与前述值一致,块偏移量 306 的值为' 3 ',偏移量 303 的值为' 5 ',通过上述方法选择出主动表中路组 0 的第 14 行对应的表项 3 的值。 而此时,表项 2 的有效位为' 0 ',表明该分支指令不在一级指令缓存 112 中,那么将其在主动表中的路组号加上目标指令地址 312 的索引位 307 作为一个二级块号( BNX2 )和块偏移量 306 以及偏移量( BNY ) 303 的值一起写进轨道表中,即将' 0|14|3|5' ,写入轨道表的第 3 行第 7 表项中。' 0 '表明该指令对应在主动表的路组 0 中,' 14 '表明该目标指令在对应主动表的第 14 行,' 3 '表明该指令在对应主动表的第 3 个表项中,' 5 '表明该指令对应一级指令块中的第 5 条指令。 When the scanner 108 calculates that the target address of the second branch instruction is '1654|14|3|5' The tag bit and the index bit are consistent with the foregoing values, the value of the block offset 306 is '3', and the value of the offset 303 is '5'. The 14th of the path group 0 in the active table is selected by the above method. The value of table entry 3 corresponding to the row. At this time, the valid bit of entry 2 is '0', indicating that the branch instruction is not in the level one instruction cache 112, then the path group number in the active table is added to the target instruction address. The index bit 307 of 312 is written into the track table as a secondary block number (BNX2) and the block offset 306 and offset (BNY) 303 values, ie 0|14|3|5' is written in the third row and seventh entry of the track table. ' 0 ' indicates that the instruction corresponds to the way group 0 of the active table, and ' 14 ' indicates that the target instruction is in the 14th of the corresponding active list. Line, '3' indicates that the instruction is in the third entry of the corresponding active table, and '5' indicates that the instruction corresponds to the fifth instruction in the primary instruction block.
当扫描器 108 计算得出的第三条分支指令的目标地址为' 3546|14|2|8',即表明该目标指令地址 312 的标签位 311 的值为' 3546 ',索引位 307 的值为' 14 ',块偏移量 306 的值为' 2 ',偏移量 303 的值为' 8 '。通过前述方法不能与主动表中的任意一项匹配成功,表明该指令不在二级缓存中,那么根据该目标地址将对应的指令块取进二级缓存 106 中,根据 LRU 替换算法,将指令块取到二级缓存的路组 1 中的第 14 行第 2 表项中。那么将其在主动表中的路组号加上目标指令地址 312 的索引位 307 作为一个二级块号( BNX2 )和块偏移量 306 以及偏移量( BNY ) 303 的值一起写进轨道表中,即将' 1|14|2|8',写入轨道表的第 3 行第 11 表项中。替换算法也可以采用先进先出算法( FIFO )、最近最少使用算法( LRU )、随机替换算法( Random )等现有算法。  When the scanner 108 calculates the target address of the third branch instruction is ' 3546|14|2|8', that is, the value of the tag bit 311 of the target instruction address 312 is '3546', and the value of the index bit 307 is '14', the block offset 306 The value of '2' and the value of offset 303 is '8' '. The foregoing method cannot successfully match any one of the active tables, indicating that the instruction is not in the secondary cache, and the corresponding instruction block is taken into the secondary cache 106 according to the target address, according to the LRU. Replace the algorithm and fetch the instruction block into the 14th row and 2nd entry in the L2 of the L2 cache. Then add the path group number in the active table to the index bit of the target instruction address 312 307 As a secondary block number (BNX2) and block offset 306 and offset (BNY) 303 values are written into the track table, ie ' 1|14|2|8', written to the third of the track table Line 11 is in the entry. The replacement algorithm can also use existing algorithms such as a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm (Rand).
当循迹器 114 的读指针指向轨道表的第 3 行第 4 表项时,读出存储在该轨迹点的值' 5|3 '包含一个一级块号,表明该分支指令的目标指令已经存储在一级缓存 112 的第 5 行中,那么等到执行到该指令时,处理器核 116 可以直接从一级指令缓存 112 的第 5 行中直接读取指令供处理器核 116 使用。 When the read pointer of the tracker 114 points to the third row and the fourth entry of the track table, the value stored in the track point is read out as '5|3 'Contains a level one block number indicating that the target instruction of the branch instruction has been stored in the fifth row of the level 1 cache 112, so that by the execution of the instruction, the processor core 116 can directly cache from the level one instruction. Directly read instructions in line 5 of 112 for use by processor core 116.
在本实施例中,假设某条分支指令的目标指令地址是' 1654|14|3|5' ,且该指令已经被执行,那么表明该指令已经被填充进了一级指令缓存 112 中。又假设该目标指令已经被存储在一级指令缓存 112 的第 9 行中,那么将数值' 9 '写进主动表路组 0 的第 14 行第 3 表项中,并将该表项的有效位设为' 1 '。  In this embodiment, it is assumed that the target instruction address of a branch instruction is '1654|14|3|5' And the instruction has been executed, indicating that the instruction has been populated into the level one instruction cache 112. Assuming that the target instruction has been stored in the 9th line of the level 1 instruction cache 112, then the value '9' 'Write in the 14th row and 3rd entry of the active table group 0 and set the valid bit of the entry to '1'.
因此,当循迹器 114 的读指针指向轨道表的第 3 行第 7 表项时,读出存储在该轨迹点的值' 0|14|3|5' ,包含一个二级块号 BNX2 ,那么根据路组号' 0 '找到主动表的路组 0 ,根据索引号和块偏移量找到主动表的第 14 行第 3 表项,发现此时存储在该表项中的一级块号 BNX1 已经有效。那么可以根据该一级块号 BNX1 直接从一级缓存中第 9 行中读取指令,而不再需要从二级缓存中重复读取。同时将该表项中存储的一级块号值' 9 '写入轨道表的第 3 行第 7 表项中,即在轨道表 110 的第 3 行第 7 项中存储一个包含一级块号 BNX1 信息的值' 9|5 '完成轨道表的更新。等到执行到该条指令时,处理器核 116 可以直接从一级指令缓存 112 的第 9 行中直接读取指令供处理器核 116 使用。 Therefore, when the read pointer of the tracker 114 points to the third row and the seventh entry of the track table, the value stored in the track point is read out' 0|14|3|5' , including a secondary block number BNX2, then find the path group 0 of the active table according to the road group number ' 0 ', find the 14th row of the active table according to the index number and the block offset. The entry indicates that the primary block number BNX1 stored in the entry is valid. Then according to the first block number BNX1 directly from the first level cache The instruction is read in the row and no longer needs to be read from the secondary cache. At the same time, the primary block number value '9' stored in the entry is written in the third row and the seventh entry of the track table, that is, on the third row of the track table 110. The 7 item stores a value of '9|5' containing the first block number BNX1 information to complete the update of the track table. Wait until the instruction is executed, the processor core 116 can directly from the first instruction cache 112 The instructions are read directly from the 9 lines for use by the processor core 116.
当循迹器 114 的读指针指向轨道表的第 3 行第 11 表项时,读出存储在该轨迹点的值' 1|14|2|8',包含二级块号 BNX2 ,那么根据前述方法以二级块号加上块偏移量 306 的值作为主动表地址查找到主动表 104 的路组 1 中第 14 行第 2 表项中存储的一级块号 BNX1 是无效的,那么表明对应的分支目标指令不在一级指令缓存 112 中。因此,将存储在二级缓存 106 中对应的一级指令块填充进根据替换算法确定的一级块号 BNX1 的值为 38 指向的一级指令块中,即将存储在二级指令缓存 106 中的对应一级指令块填充进一级指令缓存 112 的第 38 行中,同时将值' 38 '写进主动表的路组 1 的第 14 行第 2 表项中,并将主动表 104 中路组 1 的第 14 行第 2 表项的有效位设为' 1 ',同时将包含包含一级块号 BNX1 信息的值' 38|8 '写进轨道表 110 的第 3 行第 11 表项中,完成主动表和轨道表更新。替换算法也可以采用先进先出算法( FIFO )、最近最少使用算法( LRU )、随机替换算法( Random )等现有算法。  When the read pointer of the tracker 114 points to the third row and eleventh entry of the track table, the value stored in the track point is read out' 1|14|2|8', including the secondary block number BNX2, then find the 14th in the way group 1 of the active table 104 with the value of the secondary block number plus the block offset 306 as the active table address according to the foregoing method. The primary block number BNX1 stored in the second entry of the row is invalid, indicating that the corresponding branch target instruction is not in the primary instruction cache 112. Therefore, it will be stored in the secondary cache 106 The corresponding first-level instruction block is filled into the first-level block number determined by the replacement algorithm. The value of BNX1 is 38. The first-level instruction block pointed to is stored in the second-level instruction cache 106. The corresponding one-level instruction block is filled in the 38th line of the first-level instruction cache 112, and the value '38' is written into the 14th row and the second entry of the way group 1 of the active table, and the active table 104 The valid bit of the 14th row and 2nd entry of the middle group 1 is set to '1', and the value '38|8' containing the information of the primary block number BNX1 is written into the third row of the track table 110. In the table entry, the active table and track table update are completed. The replacement algorithm can also use a first in first out algorithm (FIFO), a least recently used algorithm (LRU), and a random replacement algorithm ( Random ) and other existing algorithms.
根据本发明技术方案,还可以在主动表的表项中增加用于存储该表项对应的二级指令块的顺序地址前一个二级指令块的二级块号中的路组号的存储域 P 和顺序地址后一个二级指令块的二级块号中的路组号的存储域 N 。这样,当扫描器审查发现被审查的分支指令的分支目标指令位于所述分支指令所在的二级指令块的顺序地址的前一个或后一个二级指令块时,可以直接根据被审查的分支指令对应的二级块号从主动表中读出相应的前一个或后一个二级指令块的路组号,所述路组号与被审查的分支指令对应的索引位减一或增一的结果拼接即可得到相应的前一个或后一个二级指令块的二级块号,从而避免将所述分支目标指令地址送往主动表进行匹配的操作。 According to the technical solution of the present invention, a storage domain of the road group number in the secondary block number of the previous two-level instruction block of the sequential address of the secondary instruction block corresponding to the secondary instruction block corresponding to the entry may be added to the entry of the active table. P And the storage domain of the road group number in the secondary block number of the secondary instruction block after the sequential address N . In this way, when the scanner examines that the branch target instruction of the branch instruction being examined is located in the previous or next secondary instruction block of the sequential address of the secondary instruction block in which the branch instruction is located, it may directly according to the branch instruction being examined. The corresponding secondary block number reads out the path group number of the corresponding previous or next secondary instruction block from the active table, and the result of the path group number corresponding to the index bit corresponding to the examined branch instruction is decreased by one or increased by one. The splicing can obtain the corresponding secondary block number of the previous or next secondary instruction block, thereby avoiding the operation of sending the branch target instruction address to the active table for matching.
在本发明中,当扫描器对一级指令块(简称为当前一级指令块)审查时,如果当前一级指令块是其所在的二级指令块(简称为当前二级指令块)中的最后一个一级指令块,那么如前建立当前一级指令块对应的结束轨迹点。若当前一级指令块的顺序地址下一个一级指令块所在的二级指令块(简称为后一个二级指令块)已经存储在二级缓存中,则直接将该后一个二级指令块对应的二级块号作为轨迹点内容填入所述结束轨迹点中;若所述后一个二级指令块尚未存储在二级缓存中,则如前所述将该后一个二级指令块填充到二级缓存中由替换算法确定的位置中,并将相应的二级块号作为轨迹点内容填入所述结束轨迹点中。此时所述当前二级指令块的顺序地址后一个二级指令块的二级块号就是所述后一个二级指令块的二级块号,可以将所述二级块号中的路组号作为存储域内容填充到所述当前二级指令块对应的二级块号(简称为当前二级块号)指向的主动表表项中的存储域 N 中;同时所述后一个二级指令块的顺序地址前一个二级指令块的二级块号就是所述当前二级块号,可以将所述二级块号中的路组号作为存储域内容填充到所述后一个二级指令块对应的二级块号指向的主动表表项中的存储域 P 中。 In the present invention, when the scanner reviews a level one instruction block (referred to as a current level one instruction block), if the current level one instruction block is in the second level instruction block (referred to as the current level two instruction block) The last level one instruction block, then the end track point corresponding to the current level one instruction block is established as before. If the secondary instruction block (hereinafter referred to as the next secondary instruction block) where the next level one instruction block is located in the sequential address of the current level one instruction block is already stored in the second level buffer, the subsequent second level instruction block is directly corresponding to The second block number is filled in the end track point as the track point content; if the latter second level instruction block is not yet stored in the second level cache, the latter second level instruction block is filled as described above. The position in the secondary cache determined by the replacement algorithm, and the corresponding secondary block number is filled into the ending track point as the track point content. At this time, the secondary block number of the second level instruction block of the current secondary instruction block is the second block number of the second level instruction block, and the path group in the second level block number may be used. The storage area in the active table entry pointed to by the secondary block number corresponding to the current secondary instruction block (referred to as the current secondary block number) N At the same time, the secondary block number of the previous secondary instruction block of the subsequent secondary instruction block is the current secondary block number, and the path group number in the secondary block number can be used as the storage domain. The content is filled in the storage domain in the active table entry pointed to by the secondary block number corresponding to the second level of the instruction block. P.
此外,还可以通过如下操作填充或更新主动表表项中的存储域 P 和 N 。当有新的二级指令块被填充到二级缓存时,由于当前二级指令块与其顺序地址的前一个或后一个二级指令块的标签相同,索引位相差' 1 ',因此可以对该二级指令块地址的索引位分别减一和增一,从而得到该二级指令块的顺序地址前一个二级指令块和后一个二级指令块的索引位值,并根据所述计算得到的索引位值从主动表中读出对应位置的所有路组中存储的内容。再将读出内容中的所有标签与当前二级指令块的标签比较,若在当前二级指令块索引位减一的所有路组中有标签匹配,则该匹配表项中的路组号就可以作为存储域内容填充到当前二级块号指向的主动表表项中的存储域 P 中,并将当前二级块号中的路组号作为存储域内容填充到所述匹配表项中的存储域 N 中;若在当前二级指令块索引位增一的所有路组中有标签匹配,则该匹配表项中的路组号就可以作为存储域内容填充到当前二级块号指向的主动表表项中的存储域 N 中,并将当前二级块号中的路组号作为存储域内容填充到所述匹配表项中的存储域 P 中。 In addition, you can populate or update the storage domains P and N in the active table entry by doing the following: . When a new secondary instruction block is filled into the secondary cache, since the current secondary instruction block has the same label as the previous or next secondary instruction block of its sequential address, the index bits differ by ' 1 ', therefore, the index bit of the secondary instruction block address can be decremented by one and incremented by one, thereby obtaining the index bit value of the previous two-level instruction block and the latter second instruction block of the sequential address of the second-level instruction block, and The content stored in all the way groups of the corresponding location is read from the active table according to the calculated index bit value. Then compare all the tags in the read content with the tags of the current secondary instruction block. If there is a tag match in all the path groups of the current secondary instruction block index bit minus one, then the path group number in the matching entry is Can be used as the storage domain content to fill the storage domain in the active table entry pointed to by the current secondary block number. In P, the road group number in the current secondary block number is filled as the storage domain content to the storage domain in the matching entry. If there is a tag matching in all the path groups in which the index bit of the current level 2 instruction block is incremented, the path group number in the matching entry may be filled as the storage domain content to the active table table pointed to by the current secondary block number. Storage domain in the item N And filling the path group number in the current secondary block number as the storage domain content into the storage domain P in the matching entry.
图 5 是本发明所述二级缓存采用二路组形式构成的另一具体实施例 500 。在本实施例中,目标指令地址 312 采用了完整指令地址的一部分进行说明。假设一个一级指令块包含 4 条指令,因此指令行地址 312 中的偏移量 303 有 2 位,用此偏移量可以确定一条指令在一级指令块中的位置,称为 BN1Y 。又假设轨道表有 128 行,那么一级块号 BN1X( 即前述 BNX1) 就有 7 位,其值由所在的行号决定。 BN1X 拼接上 BN1Y 称为 BN1 ,由此可以确定该指令在一级指令缓存 112 中的位置。由于一个二级指令块包含 4 个一级指令块,那么块偏移量 306 为 2 位。块偏移量 306 拼接上偏移量 303 称为 BN2Y 。又假设主动表有 1024 行,那么索引位 307 为 10 位,加上对应的路组号称为二级块号 BN2X (与前述 BNX2 一致)。 FIG. 5 is another embodiment 500 of the second level cache of the present invention in the form of a two-way group. In this embodiment, the target instruction address 312 is described using a portion of the full instruction address. Suppose a level one instruction block contains 4 instructions, so the offset 303 in the instruction line address 312 has 2 Bit, with this offset, determines the position of an instruction in the level one instruction block, called BN1Y. Also assume that the track table has 128 rows, then the first block number BN1X (that is, the aforementioned BNX1) has 7 Bit, whose value is determined by the line number in which it is located. BN1X is spliced and BN1Y is called BN1, so that the position of the instruction in the level one instruction cache 112 can be determined. Since a secondary instruction block contains 4 One level of instruction block, then block offset 306 is 2 bits. Block offset 306 The stitching offset 303 is called BN2Y. Also assume that the active table has 1024 rows, then the index bit 307 is The 10-bit number plus the corresponding road group number is called the secondary block number BN2X (consistent with the aforementioned BNX2).
本实施例的结构与图 4 中基本一致,唯一的变动是主动表 104 中的每一行都增加了存储本行所代表的指令块的前一个指令块地址与下一个指令块地址的表项,以及服务上述新增表项的选择器。在 104 中左面的阵列中每行(代表一个二级缓存块)除图 4 中原有的存储标签的表项 118 ,存储该行中的 4 个一级缓存块地址的 4 个表项 408 以外,新增了存储按地址顺序前一个二级缓存块地址的表项 501 ,及存储顺序下一个二级缓存块地址的表项 503 。相应地,该左面阵列的输出,表项 408 的输出仍由原有的选择器 521 选择,选择器 521 的输出以及新增表项 501 , 503 的输出另由新增的选择器 531 选择。同理,右侧阵列新增了存储前一个二级缓存块地址的表项 501 ,及存储下一个二级缓存块地址的表项 503 , 以及对应于选择器 531 的选择器 532 。 The structure of this embodiment is basically the same as that in FIG. 4, and the only change is the active table 104. Each row in the table adds an entry that stores the address of the previous instruction block and the address of the next instruction block of the instruction block represented by the row, and a selector that serves the above-mentioned new entry. At 104 Each row in the left-hand array (representing a L2 cache block) stores four entries of the four L1 cache block addresses in the row, in addition to the original storage tag entry 118 in FIG. In addition, an entry 501 storing the address of the previous L2 cache block in the order of addresses and an entry 503 storing the next L2 cache block address in the order are added. Accordingly, the output of the left array, entry 408 The output is still selected by the original selector 521, the output of the selector 521 and the output of the newly added entries 501, 503 are additionally added by the selector 531. Choose. Similarly, the right side array adds an entry 501 storing the previous L2 cache block address, and an entry 503 storing the next L2 cache block address, and a selector 532 corresponding to the selector 531. .
如同在图 4 中一样,比较器 420 控制一个三态门将选择器 531 的输出放上总线送往轨道表 110 中存储;比较器 422 控制另一个三态门将选择器 532 的输出放上同一条总线送往轨道表 110 中存储。标签 118 ,标签 120 与输入地址比较的结果决定哪个选择器的输出(哪一路的指令地址)被送入轨道表 110 存储。 As in Figure 4, comparator 420 controls a three-state gate to put the output of selector 531 on the bus to the track table. The memory is stored in 110; the comparator 422 controls another tri-state gate to put the output of the selector 532 on the same bus and store it in the track table 110. Label 118 , label 120 The result of comparison with the input address determines which selector output (which instruction address) is sent to the track table 110 for storage.
由于本实施例中缓存器的组织形式是组相连。 当前二级指令块的前一个或者后一个二级指令块的索引地址可以通过当前二级指令块的索引地址(即图 4 中 307 )减 1 或者加 1 得到,那么在新增的存储前一块地址的表项 501 、 502 和后一块地址的表项 503 、 504 中,只需要存储当前二级指令块的前一个或者后一个二级指令块所在路组的路号( way number )即可。 为了便于描述,在之后的实施例中,若无特别说明,'分支源指令'均为直接分支指令。 Since the organization of the buffers in this embodiment is a group connection. The index address of the previous or next secondary instruction block of the current secondary instruction block can be decremented by 1 or plus 1 by the index address of the current secondary instruction block (ie, 307 in Figure 4). Obtained, then the entries 501, 502 of the previous block address and the entries 503, 504 of the last block address are added. In this case, it is only necessary to store the way number of the way group of the previous or next secondary instruction block of the current secondary instruction block. For convenience of description, in the following embodiments, the 'branch source instructions' are all direct branch instructions unless otherwise specified.
当二级缓存 106 中的一个二级指令子块根据 LRU 替换策略填充进一级缓存 112 中时,扫描器 108 对从二级缓存 106 送往一级缓存 112 的二级指令子块进行审查,当发现该二级指令子块中的某条指令是分支指令,即计算该分支源指令的分支目标地址。 When a secondary instruction sub-block in the secondary cache 106 is populated into the primary cache 112 according to the LRU replacement policy, the scanner 108 pairs from the secondary cache 106 to the primary cache 112 The secondary instruction sub-block is reviewed. When it is found that one of the instructions in the secondary instruction sub-block is a branch instruction, the branch target address of the branch source instruction is calculated.
为了降低功耗,即减少主动表 104 的访问次数,在扫描器 108 中判断出分支目标指令所在位置是否超出一级指令块边界、当前二级指令块边界及当前二级指令块的前一个或者后一个二级指令块边界的方法来降低访问主动表 104 的频率。 In order to reduce power consumption, that is, to reduce the number of accesses of the active table 104, in the scanner 108 The method for determining whether the location of the branch target instruction exceeds the level of the first instruction block, the current level of the second instruction block, and the boundary of the previous or next level of the second instruction block of the current level 2 instruction block reduces the access to the active table. Frequency of.
在本实施例中,用分支地址偏移量与基地址的低位相加的方法来确定分支目标地址的地址边界判断情况。如图 5 所示,将分支偏移量( OFFSET ) 571 与基地址低位 581 相加,从加法器的三个边界上提取进位信号( 574 、 575 和 576 ),该三个信号用优先权逻辑处理使得有效的代表最大数据块的'界内'信号会使得代表较小的数据块的界内信号无效。 In the present embodiment, the address boundary determination of the branch target address is determined by adding the branch address offset to the lower order of the base address. Figure 5 As shown, the branch offset (OFFSET) 571 is added to the base address low 581, and the carry signals are extracted from the three boundaries of the adder (574, 575, and 576). The three signals are processed with priority logic such that an effective 'in-bound' signal representing the largest block of data would invalidate the in-boundary signal representing the smaller block of data.
如图 5 所示,将基地址低位 581 分成 3 部分,第一部分为基地址 311 的偏移量 303 ,第二部分为块偏移量 306 ,第三部分 579 为地址中比块偏移量 306 高一位。将分支偏移量 571 分成两部分,低位部分 573 对应基地址 311 的低位部分 581 ,剩下的为高位部分 572 。同理,也将产生的和值 582 按照与基地址相同的边界划分成 3 部分,在每个边界上产生进位信号 574 、 575 和 576 。  As shown in FIG. 5, the base address lower bit 581 is divided into three parts, and the first part is the offset of the base address 311. The second part is the block offset 306, and the third part 579 is one bit higher than the block offset 306 in the address. The branch offset 571 is divided into two parts, and the lower part 573 corresponds to the base address 311 The lower part of the 581, the remaining part is the high part 572. Similarly, the generated sum value 582 is divided into three parts according to the same boundary as the base address, and a carry signal 574 is generated on each boundary. 575 and 576.
以分支偏移量 571 是正数为例,确定地址边界判断情况的方法如下:  Taking the branch offset 571 as a positive example, the method for determining the address boundary judgment is as follows:
1 、若分支偏移量 571 的高位部分 572 不全是' 0 ',则加法器计算出的分支目标地址超出当前二级指令块的后一个二级指令块。此情况称为情况 1 。  1, if the branch offset 571 high part 572 is not all '0 ', then the branch target address calculated by the adder exceeds the next level two instruction block of the current secondary instruction block. This situation is called case 1.
2 、若分支偏移量 571 的高位部分 572 全是' 0 ',且进位信号 574 、 575 和 576 都为' 0 ',表明分支目标地址在分支源指令所在一级指令块中。此情况称为情况 2 。 2. If the high-order portion of the branch offset 571 572 is '0', and the carry signals 574, 575 and 576 is '0', indicating that the branch target address is in the level one instruction block where the branch source instruction is located. This situation is called case 2.
3 、若分支偏移量 571 的高位部分 572 全是' 0 ',且进位信号 574 为' 1 ',而进位信号 575 和 576 为' 0 ',表明分支目标地址在分支源指令所在二级指令块中。此情况称为情况 3 。 3. If the high-order portion of the branch offset 571 572 is '0', and the carry signal 574 is '1' ', and the carry signals 575 and 576 are '0', indicating that the branch target address is in the second instruction block where the branch source instruction is located. This situation is called case 3.
4 、若分支偏移量 571 的高位部分 572 全是' 0 ',且进位信号 575 为' 1 ',而进位信号 576 为' 0 ',表明分支目标地址在分支源指令所在二级指令块的后一个二级指令块中。此情况称为情况 4 。 4. If the branch portion offset 571 has the upper part 572 all '0' and the carry signal 575 is '1' ', and the carry signal 576 is '0', indicating that the branch target address is in the second level of the instruction block of the second instruction block where the branch source instruction is located. This situation is called case 4.
5 、若分支偏移量 571 的高位部分 572 全是' 0 ',且进位信号 576 为' 1 ',表明分支目标地址在分支源指令所在二级指令块的后一个二级指令块之外。此情况与情况 1 一致,也称为情况 1 。 5. If the branch portion offset 571 has the upper part 572 all '0' and the carry signal 576 is '1' ', indicating that the branch target address is outside the second level of the instruction block of the second instruction block where the branch source instruction is located. This situation is consistent with case 1, also known as case 1.
对于分支偏移量 571 是负数的情况,也可按照前述方法确定地址边界判断情况。不同之处在于,首先判断分支偏移量 571 的高位部分 572 是否为全' 1 '。若分支偏移量 571 的高位部分 572 不全是' 1 ',即为前述情况 1 ;若分支偏移量 571 的高位部分 572 全是' 1 ',且进位信号 574 、 575 和 576 都为' 0 ',即为前述情况 2 ;若分支偏移量 571 的高位部分 572 全是' 1 ',且进位信号 574 为' 1 ',而进位信号 575 和 576 为' 0 ',即为前述情况 3 ;若分支偏移量 571 的高位部分 572 全是' 1 ',且进位信号 575 为' 1 ',而进位信号 576 为' 0 ',即为前述情况 4 ;若分支偏移量 571 的高位部分 572 全是' 1 ',且进位信号 576 为' 1 ',即为前述情况 1 。 For branch offset 571 In the case of a negative number, the address boundary determination can also be determined according to the foregoing method. The difference is that it is first determined whether the upper portion 572 of the branch offset 571 is all '1'. If branch offset 571 The upper part 572 is not all '1', which is the first case; if the high part 572 of the branch offset 571 is '1', and the carry signals 574, 575 and 576 Both are '0', which is the case 2; if the upper part 572 of the branch offset 571 is '1', and the carry signal 574 is '1', and the carry signals 575 and 576 It is '0', which is the above case 3; if the upper part 572 of the branch offset 571 is '1', and the carry signal 575 is '1', and the carry signal 576 is '0' ', that is, the above case 4; if the upper part 572 of the branch offset 571 is all '1', and the carry signal 576 is '1', the above case 1 is obtained.
因此可以基于上述关系减少对主动表的访问频率。当扫描器 108 扫描一个指令段时, 以被暂存在扫描器中的该指令段的 BN1X 基地址及 PC 地址计算分支目标指令地址,计算所得的分支目标所在位置有如下几种情况。 Therefore, the frequency of access to the active list can be reduced based on the above relationship. When the scanner 108 scans an instruction segment, The branch target instruction address is calculated by the BN1X base address and the PC address of the instruction segment temporarily stored in the scanner, and the calculated branch target location is as follows.
当扫描器 108 审查得到地址边界判断情况 1 时,将扫描器 108 计算得到的分支目标指令地址经总线 507 送到主动表 104 ,根据其中的索引位读出相应行,并将读出的标签与扫描器 108 计算得到的分支目标指令地址的标签进行匹配,若匹配成功,后续操作与前述一致。若匹配不成功,根据计算得到的分支目标地址从更低层次的存储器中取出相应指令块填充到由替换策略确定的二级缓存块中,后续操作与前述一致。 When the scanner 108 reviews the address boundary determination condition 1, the scanner 108 The calculated branch target instruction address is sent to the active table 104 via the bus 507, the corresponding row is read according to the index bit therein, and the read tag and the scanner 108 are read. The calculated label of the branch target instruction address is matched. If the matching is successful, the subsequent operations are consistent with the foregoing. If the matching is unsuccessful, the calculated branch target address is taken from the lower level memory and the corresponding instruction block is filled into the second level cache block determined by the replacement policy, and the subsequent operations are consistent with the foregoing.
当扫描器 108 审查得到地址边界判断情况 2 时,分支目标地址和分支源地址在同一个一级指令块中,即分支目标指令和分支源指令有相同的 BN1X 。此时,强制所有路组的三态门(如三态门 541 等)关断,并将存储在扫描器中的分支源 BN1X 和计算得到的偏移量 303 (即分支目标 BN1Y )合并为 BN1 经总线 505 写进由扫描器 108 中暂存的分支源 BN1X 和 BN1Y 指向的轨道表 110 的表项中,待执行到该分支源指令时,处理器 116 可以直接从一级缓存 112 中直接读取指令供处理器 116 使用。 When the scanner 108 examines the address boundary judgment condition 2 When the branch target address and the branch source address are in the same level one instruction block, that is, the branch target instruction and the branch source instruction have the same BN1X. At this point, force the tristate gates of all the way groups (such as the three-state gate 541 And so on, and the branch source BN1X stored in the scanner and the calculated offset 303 (i.e., branch target BN1Y) are merged into BN1 and written by the bus 505 into the scanner 108. In the entry of the track table 110 pointed to by the branch source BN1X and BN1Y, the processor 116 can directly cache from the level 1 when the branch source instruction is to be executed. The direct read instruction is used by the processor 116.
当扫描器 108 审查得到地址边界判断情况 3 时,分支目标地址和分支源地址在同一个二级指令块中,即分支目标指令和分支源指令有相同的 BN2X 。此时,将存储在扫描器中的分支源指令所在指令块的 BN2X (包括路组号和索引位)经总线 507 索引读出主动表 104 相应表项中的第二存储块(如第二存储块 408 或 410 ),并用计算得到的块偏移量 306 ( Block-offset )选出该第二存储块中对应存储域的内容。若存储在所述存储域中的 BN1X 值有效,则强制所述分支源 BN2X 中路组号对应的路组的三态门导通,其他路组的三态门关断,将该 BN1X 值经总线 508 送到轨道表 110 ,同时将计算得到的分支目标 BN2Y 值去掉块偏移量 306 后得到的分支目标 BN1Y 经总线 505 送到轨道表 110 ,两者合并为分支目标 BN1 写进轨道表 110 中由扫描器 108 中暂存的分支源 BN1X 和 BN1Y 指向的表项中。若存储在所述存储域中的 BN1X 值无效,则强制所有路组的三态门关断,并将存储在扫描器 108 中的分支源 BN2X 和计算得到的分支目标 BN2Y 合并为 BN2 经总线 505 写进轨道表 110 中由扫描器 108 中暂存的分支源 BN1X 和 BN1Y 指向的表项中。后续操作与前述一致。 When the scanner 108 examines the address boundary judgment situation 3 When the branch target address and the branch source address are in the same level two instruction block, that is, the branch target instruction and the branch source instruction have the same BN2X. At this point, the BN2X of the instruction block where the branch source instruction stored in the scanner is located The second storage block (such as the second storage block 408 or 410) in the corresponding entry of the active table 104 is read out via the bus 507 index (including the way group number and the index bit), and the calculated block offset 306 is used ( Block-offset) selects the content of the corresponding storage domain in the second storage block. Forcing the branch source BN2X if the BN1X value stored in the storage domain is valid The tri-state gate of the road group corresponding to the middle road group number is turned on, the tri-state gate of the other road group is turned off, the BN1X value is sent to the track table 110 via the bus 508, and the calculated branch target BN2Y is also calculated. The branch target BN1Y obtained after the block offset 306 is removed is sent to the track table 110 via the bus 505, and the two are merged into a branch target. BN1 is written into the track table 110 by the scanner 108. The branch source in the temporary storage BN1X and BN1Y points to the table entry. If the BN1X value stored in the storage domain is invalid, the tristate gates of all the way groups are forced to be turned off and stored in the scanner 108. The branch source BN2X and the calculated branch target BN2Y are merged into BN2 and written to the branch source BN1X temporarily stored by the scanner 108 in the track table 110 via the bus 505. BN1Y points to the table entry. Subsequent operations are consistent with the foregoing.
当扫描器 108 审查得到地址边界判断情况 4 时,分支目标地址在分支源地址的前一个二级指令块或后一个二级指令块中,即分支目标指令的索引位值与分支源指令的索引位值相差'± 1 '(即前一个二级指令块的索引位值与分支源指令的索引位值相差' -1 ',后一个二级指令块索引位的值与分支源指令索引位的值相差' +1 ')。此时,将存储在扫描器中的分支源 BN2X (包括路组号和索引位)经总线 507 索引读出主动表 104 相应表项中的第三存储块(如第三存储块 501 、 502 或 502 、 504 ),并根据所述地址边界判断结果,当分支目标地址在分支源地址的前一个二级指令块时选择相应的存储域 P (如第三存储块 501 或 50 ),当分支目标地址在分支源地址的后一个二级指令块时选择相应的存储域 N (如第三存储块 503 或 504 中)。若被选出的存储域中存储的路组号有效,则强制所述分支源 BN2X 中路组号对应的路组的三态门导通,其他路组的三态门关断,将该路组号经总线 508 送到轨道表 110 ,同时将存储在扫描器 108 中的分支源索引位进行减一或增一操作后得到的新的索引位的值和计算得到的分支目标 BN2Y 值经总线 505 送到轨道表 110 ,两者合并为分支目标 BN2 写进轨道表 110 中扫描器 108 中暂存的分支源 BN1X 和 BN1Y 指向的表项中。若被选出的存储域中存储的路组号无效,则根据扫描器 108 计算得到的分支目标地址经总线 506 送到主动表 104 中进行索引匹配,后续操作与前述地址边界判断情况 1 的操作一致。 When the scanner 108 examines the address boundary judgment situation 4 When the branch target address is in the previous two-level instruction block or the next two-level instruction block of the branch source address, that is, the index bit value of the branch target instruction is different from the index bit value of the branch source instruction by ±1 '(The index bit value of the previous two-level instruction block is different from the index bit value of the branch source instruction' -1 ', and the value of the index bit of the latter two-level instruction block is different from the value of the branch source instruction index bit' +1' '). At this time, the branch source BN2X (including the path group number and the index bit) stored in the scanner is read out via the bus 507 index to read the third storage block in the corresponding entry of the active table 104 (such as the third storage block 501, 502 or 502, 504), and according to the address boundary determination result, when the branch target address is in the previous two-level instruction block of the branch source address, select the corresponding storage domain P (such as the third storage block 501 or 50), when the branch target address is in the next secondary instruction block of the branch source address, select the corresponding storage domain N (such as the third storage block 503 or 504) Medium). If the path group number stored in the selected storage domain is valid, the three-state gate of the road group corresponding to the path group number in the branch source BN2X is forced to be turned on, and the tri-state gates of the other road groups are turned off, and the path group is turned off. Number bus 508 The value of the new index bit obtained by decrementing or incrementing the branch source index bit stored in the scanner 108 and the calculated branch target BN2Y value are sent to the track table 110 via the bus 505. It is sent to the track table 110, and the two are merged into a branch target. BN2 is written into the branch source BN1X and BN1Y temporarily stored in the scanner 108 in the track table 110. Point to the table entry. If the path group number stored in the selected storage domain is invalid, the branch target address calculated according to the scanner 108 is sent to the active table via the bus 506. The index matching is performed, and the subsequent operations are consistent with the operation of the foregoing address boundary judgment case 1.
上述实施例中减少了读取主动表 104 中标签与地址相比较的频率, 但是在情况 2 与 3 中还需要以路组号,索引地址 307 直接查找图 5 中主动表 104 中的一行中 408 , 410 表项以获得同一个二级指令块中的第一指令地址,或 501 , 502 表项中的前一第二地址,或 503 , 504 表项中的下一第二地址。如果在扫描器 108 扫描从低层缓存器 126 或 128 向高层缓存器 112 填充的指令块时,将与该指令块相应的主动表( 104 )行(与上述指令块同一路组号,同一索引地址)中的上述表项填入扫描器 108 中暂存,即可进一步减少对主动表 104 的访问频率。 更进一步地,当上述扫描器 108 中的暂存器具有多个独立的读口,则正在被扫描的指令段中的复数条分支指令可以同时根据各自的分支目标指令的地址边界判断情况, 独立通过访问分配给该指令的读口以映射得到该指令分支目标的 BN1 或 BN2 形式地址,以便存放入轨道表 110 。 In the above embodiment, the frequency of reading the tag in the active table 104 compared to the address is reduced, but in cases 2 and 3 It is also necessary to directly find the 408, 410 entries in a row in the active table 104 in FIG. 5 by the way group number and the index address 307 to obtain the first instruction address in the same secondary instruction block, or 501. , the previous second address in the 502 entry, or the next second address in the 503, 504 entry. If the scanner 108 scans from the lower layer buffer 126 or 128 to the upper level buffer When the instruction block is filled, the above table entry in the active table (104) row corresponding to the instruction block (same group number and same index address as the above instruction block) is filled into the scanner 108. Temporary storage can further reduce the frequency of access to the active table 104. Further, when the scanner 108 described above The register in the register has a plurality of independent read ports, and the plurality of branch instructions in the instruction segment being scanned can simultaneously determine the situation according to the address boundary of the respective branch target instruction. The BN1 or BN2 form address of the instruction branch target is independently mapped by accessing the read port assigned to the instruction for storage in the track table 110.
图 6 是本发明所述二级缓存结构中扫描器构成的另一具体实施例 600 。在本实施例中,高层缓存器 112 每个指令块含有 4 条指令,即偏移量 303 BNY 地址为两位;低层缓存器 126 或 128 每个缓存块含有 4 个高层缓存块,即块偏移 306 地址也为两位。轨道表 104 中一行对应一个低层缓存块, 行中含有如 408 中的 4 个存储 BN1X 地址的表项,及如 501 存储前一低层地址块的路组号的表项,及如 503 存储下一低层地址块的路组号的表项。每次从低层指令缓存器 126 或 128 填充一个高层缓存块共 4 条指令到高层缓存器 112 。扫描器 108 中相应有一个译码和判断模块 601 ,其中包 4 个指令译码和判断字块,每个子块含有一个指令译码器及一个加法器如 607 。扫描器 108 还包含一个微型主动块 660 。 整个扫描器 608 可以代入图 5 中扫描器 108 ,其他顶层结构与图 5 中相同, 在图中只显示了轨道表 110 。  Figure 6 is another embodiment 600 of a scanner constructed in a level two cache structure of the present invention. In this embodiment, the upper level buffer 112 Each instruction block contains 4 instructions, that is, the offset 303 BNY address is two bits; the lower layer buffer 126 or 128 each cache block contains 4 high-level cache blocks, that is, block offset 306 The address is also two. One row of the track table 104 corresponds to a lower layer cache block, and the row contains four entries of the BN1X address stored in 408, and such as 501 An entry for storing the path group number of the previous lower layer address block, and an entry for storing the path group number of the next lower layer address block as 503. Each time a low-level cache block is filled from the lower-level instruction buffer 126 or 128. The instructions are to the upper level buffer 112. There is a decoding and judging module 601 in the scanner 108, in which 4 instructions are decoded and judged, and each sub-block contains an instruction decoder and an adder such as 607. The scanner 108 also includes a miniature active block 660. The entire scanner 608 can be substituted into the scanner 108 of Figure 5, and the other top structures are the same as in Figure 5. Only the track table 110 is shown in the figure.
当一个高层指令块被送到扫描器 608 扫描时,其相应主动表行被同时从主动表 104 读出,连同该行的路组号,索引值 307 (即该指令块的低层缓存块号),块偏移 306 都被送到扫描器 108 暂存。 其中扫描器 108 中存储主动表行中的标签表项 118 ,以及上述 306 的存储器都未在图 6 中显示。扫描器 608 中的微型主动块 660 含有 620 , 621 , 622 , 623 个存储表项,以存储主动表 104 中如 408 表项中的 4 个 BN1X 地址。 660 中还含有 624 , 625 及 626 共 3 个表项,其中 624 表项存储如 104 中 501 表项的前一低层缓存块的路组号, 625 表项存储当前低层缓存块的路组号以及索引地址, 626 表项存储如 104 中 503 表项中的下一低层缓存块的路组号。其中 625 表项中存储的就是正被扫描的指令块的二级地址路组号以及索引地址 307 ,与指令块同时读进扫描器 608 。  When a higher level instruction block is sent to the scanner 608 for scanning, its corresponding active table row is simultaneously taken from the active table 104. Read, along with the way group number of the line, index value 307 (i.e., the lower level cache block number of the instruction block), block offset 306 is sent to the scanner 108 for temporary storage. Where scanner 108 The label table entry 118 in the active active table row is stored, and the memory of the above 306 is not shown in FIG. 6. The micro active block 660 in the scanner 608 contains 620, 621, 622, 623 storage entries to store 4 BN1X addresses in the active table 104 as in the 408 entry. 660 also contains 624, 625 and 626 3 entries, where 624 entries store the path group number of the previous lower level cache block of the 501 entry in 104, and 625 entries store the path group number and index address of the current lower layer cache block, 626 The entry stores the path group number of the next lower-level cache block in the 503 entry in 104. The 625 entry stores the secondary address path group number and index address of the instruction block being scanned. And read into the scanner 608 at the same time as the instruction block.
微型主动块中还有 5 个选择器 570-574 ,其中 4 个 570-573 是同样结构,按照相应译码和判断子块产生的地址边界判断情况选择表项 630-636 的内容直接或经运算后产生 BN1X 或 BN2X 地址,连同加法器如 607 等算出的地址偏移量 303 ,做为被扫描指令的分支目标地址存入轨道表中对应被扫描指令的表项。第 5 个选择器 574 选择表项 630-636 的内容填充轨道中的结束轨迹点, 其选择控制与选择器 570-573 的有所不同。 There are also 5 selectors in the micro active block 570-574, of which 4 570-573 The same structure, according to the corresponding decoding and judging the address boundary generated by the sub-block to determine the content of the selection table item 630-636 directly or after the operation to generate a BN1X or BN2X address, together with an adder such as 607 The calculated address offset 303 is stored as a branch target address of the scanned instruction in the entry corresponding to the scanned instruction in the track table. 5th selector 574 selection entry 630-636 The content fills the end track point in the track, and its selection control is different from that of selectors 570-573.
译码和判断模块 601 每个译码和判断子块对应 4 条指令中的一条,子块中的指令译码器对其负责的指令进行译码,如指令类型不是分支指令,则该子块译码产生的指令类型被存入轨道表中与该指令对应的表项,扫描器不为该指令计算分支地址。如指令类型是分支指令(该指令以下被称为分支源指令), 则该子块如前例产生一个地址边界判断, 用以选择分支目标地址,与译码产生的指令种类一同填充轨道表 110 中分支源指令对应的表项。以下的例子讲述子块译码的指令为分支指令时的情况。 Decoding and Judging Module 601 Corresponding to each decoding and judging sub-block 4 One of the instructions, the instruction decoder in the sub-block decodes the instruction it is responsible for. If the instruction type is not a branch instruction, the instruction type generated by the sub-block decoding is stored in the track table and corresponds to the instruction. The entry of the table, the scanner does not calculate the branch address for the instruction. If the instruction type is a branch instruction (this instruction is hereinafter referred to as a branch source instruction), Then, the sub-block generates an address boundary judgment as in the previous example, for selecting the branch target address, and filling the track table together with the instruction type generated by the decoding. The entry corresponding to the branch source instruction. The following example shows the case when the sub-block decoded instruction is a branch instruction.
以下都以分支偏移量为正数说明以便于理解。分支偏移量为负数可以此类推。如前例各子块先对该指令的分支偏移量 571 中高位 572 进行是否全' 0 '的判断,如非全' 0 ',则地址边界判断为情况 1 。在此判断下,将该指令分支偏移量与该指令的基地址相加。 基地址即扫描器中暂存的标签(来自主动表 104 中 408 表项),索引值(即存在 625 表项中的 307 ),块偏移 306 ,及偏移量 303BNY 合并。指令块 4 条指令各自的基地址其前 3 个部分都相同,只有 BNY 各自不同。按指令顺序第 1 条指令的 BNY 为' 0 ',之后 3 条指令的 BNY 依次为' 1 ',' 2 ',' 3 '。相加后的和即为分支目标的存储器地址,以这个存储器地址中的索引部分 307 为地址读出主动表 104 中左右阵列中的各一行。由存储器地址中的块偏移 306 控制选择器 521 选择行中 4 个表项 408 中一个中存储的 BN1X , 经选择器 531 选择(在地址边界判断情况 1 下固定选择 521 的输出)送往三态门 541 。 行中的标签表项 118 与分支目标的存储器地址中的标签部分 311 在比较器 420 中比较,如结果为相同,则相同的比较结果使能( enable )三态门 541 ,三态门 541 的输出与存储器地址中的 303 偏移量 BNY 合并送往轨道表中该被扫描指令的对应表项存储。如果是右面阵列的标签表项 120 与分支目标的存储器地址中的标签部分 311 在比较器 422 中比较结果为相同,则最后送往轨道表存储的地址中的 BN1X 来自表项 410 ,原理相同,不再赘述。以下讲述分支偏移量中高位是全' 0 '的情况。 The following is a description of the branch offset as a positive number for ease of understanding. The branch offset is negative and so on. As in the previous example, each sub-block first performs a total of '0' in the branch offset 571 of the instruction. 'The judgment, if not all '0', the address boundary is judged as case 1. At this juncture, the instruction branch offset is added to the base address of the instruction. The base address is the temporary buffer in the scanner (from the active table 104 In the 408 entry, the index value (that is, 307 in the 625 entry), the block offset 306, and the offset 303BNY merge. The instruction block has 4 instructions each with its first base address. The parts are the same, only BNY is different. The BNY of the first instruction in the order of instructions is '0', and the BNY of the following three instructions is '1', '2', '3 '. The summed sum is the memory address of the branch target, and each row in the left and right arrays in the active list 104 is read with the index portion 307 in this memory address as the address. Offset by block in memory address 306 The control selector 521 selects the BN1X stored in one of the four entries 408 in the row, and is selected by the selector 531 (fixed selection in the address boundary judgment case 1 521) The output is sent to the tri-state gate 541. The label entry in the row 118 and the label portion 311 in the memory address of the branch target are in the comparator 420 In the comparison, if the result is the same, the same comparison result enables (enable) the three-state gate 541, the output of the tri-state gate 541 and the 303 offset in the memory address BNY The corresponding entry to the scanned instruction in the track table is merged and stored. If it is the label entry 120 of the right array and the label portion 311 of the memory address of the branch target is at the comparator 422 If the result of the comparison is the same, the BN1X that is sent to the address stored in the track table is from the entry 410. The principle is the same and will not be described again. The following describes the case where the high bit in the branch offset is all '0'.
各译码和判断子块将各自负责处理的分支指令的中分支偏移量 571 与该指令基地址中的块偏移量 306 ,偏移量 303 由子块内的加法器如 607 相加(基地址中的更高位如索引位 722 ,标签 721 被弃用)。每个子块根据相加时产生的进位信号依前述方法产生地址边界判断,并根据地址边界判断产生控制信号控制选择器选择适当的存储器表项 620-626 中的值供填充轨道表。 以被扫描的指令块中的顺序第一条指令为例,该指令中的分支偏移量 571 与该指令中的块偏移量 306 ,偏移量 303 (对顺序第一条指令其偏移量 303 为' 0 ')在加法器 607 中相加。如该子块的地址边界判断为情况 1 ,则如上述同样做法,计算分支目标的存储器地址由扫描器 608 送往主动表 104 映射为一级缓存地址 BN1 后存入轨道表中分支源指令对应表项中。  Each of the decoding and decision sub-blocks will have a branch offset 571 of the branch instruction responsible for processing and a block offset 306 in the instruction base address. , offset 303 is added by an adder in the sub-block such as 607 (the higher bit in the base address such as index bit 722, label 721 Deprecated). Each sub-block generates an address boundary judgment according to the carry signal generated at the time of addition according to the foregoing method, and determines a generation control signal according to the address boundary to control the selector to select an appropriate memory entry 620-626 The value in is used to populate the track table. Taking the first instruction in the sequence of the scanned instruction block as an example, the branch offset 571 in the instruction and the block offset 306 in the instruction, the offset 303 (The offset 303 is '0' for the first instruction in the sequence) is added in the adder 607. If the address boundary of the sub-block is determined as the case 1 Then, as described above, the memory address of the branch target is calculated by the scanner 608 to be sent to the active table 104 and mapped to the level 1 cache address BN1 and stored in the branch source instruction corresponding entry in the track table.
如地址边界判断为情况 2 或情况 3 ,则该地址边界判断将加法器 607 产生的和中的块偏移量 306 放上控制线 610 以控制选择器 670 。如块偏移量 306 为' 00 ',则选择器 670 选择存储表项 620 中的内容,如该内容的有效位为'有效'时,选择器 670 输出存储表项 620 中的 BN1X 地址;当存储表项 620 中的内容中的有效位为'无效'时,选择器 670 输出存储表项 625 中存储的路组号,索引位 307 。该输出的路组号,索引位 307 与加法器 607 产生的和中的块偏移量 306 ,偏移量 303 ( BNY )合并后被送到轨道表 110 中一条轨道中的第一个表项。该轨道是正被扫描的指令块被存入的一级缓存器中的一级缓存块的对应轨道。当加法器 607 产生的和中的块偏移量 306 为' 01 ',' 10 ',及' 11 '时,则选择器 670 相应选择存储表项 621 , 622 ,及 623 中的内容,如内容无效,则选表项 625 ,与上述相同。 If the address boundary is judged as case 2 or case 3, the address boundary judges the block offset in the sum generated by the adder 607. 306 Place control line 610 to control selector 670. If the block offset 306 is '00', the selector 670 selects the storage entry 620. In the content, if the valid bit of the content is 'valid', the selector 670 outputs the BN1X address in the storage entry 620; when the valid bit in the content in the storage entry 620 is 'invalid', the selector 670 Outputs the path group number stored in the storage entry 625, index bit 307. The path group number of the output, the index bit 307 and the block offset 306 in the sum generated by the adder 607, the offset 303 (BNY) is merged and sent to the first entry in a track in track table 110. The track is the corresponding track of the level one cache block in the level one buffer into which the instruction block being scanned is stored. Adder 607 When the generated block offset 306 is '01', '10', and '11', the selector 670 selects the storage entries 621, 622, and 623 accordingly. If the content is invalid, then the entry 625 is selected, which is the same as above.
如地址边界判断为情况 4 ,且分支目标指令在前一个二级缓存块中时,则控制线 610 控制选择器 670 选择存储表项 624 读出其中的路组号,选择存储表项 625 读出其中的索引 307 。以表项 624 的上块路组号,表项 625 中的索引 307 减' 1 ',加法器 607 产生的和中的块偏移量 306 ,偏移量 303 合并成 BN2 地址存入上述轨道中的第一个表项。如地址边界判断为情况 4 ,且分支目标指令在下一个二级缓存块中时,则控制线 610 控制选择器 670 选择存储表项 626 读出其中的路组号,选择存储表项 625 读出其中的索引 307 。以表项 626 的下块路组号,表项 625 中的索引 307 加' 1 ',加法器 607 产生的和中的块偏移量 306 ,偏移量 303 合并成 BN2 地址存入上述轨道中的第一个表项。 If the address boundary is determined to be Case 4 and the branch target instruction is in the previous L2 cache block, then control line 610 controls the selector. 670 Select storage table entry 624 Read the path group number in it, select the storage table entry 625 and read out the index 307. The upper block group number of the entry 624, the index in the entry 625 307 Subtracting '1', the block offset 306 generated by the adder 607, and the offset 303 are merged into the first entry in the above track in which the BN2 address is stored. If the address boundary is judged as the situation 4 And when the branch target instruction is in the next L2 cache block, the control line 610 controls the selector 670 to select the storage table entry 626 to read out the path group number therein, select the storage table entry 625, and read the index therein. 307. With the lower block group number of the entry 626, the index 307 in the entry 625 is incremented by '1', the block offset 306 of the sum generated by the adder 607, and the offset 303 are merged into The BN2 address is stored in the first entry in the above track.
处理其他 3 条指令的其他 3 个子块也按上述方式对各自指令独立进行操作,独立作出该指令的地址边界判断,根据判断经控制线 611 , 612 , 613 分别控制选择器 671 , 672 , 673 选取存储表项 620-626 中内容,连响应子块中加法器产生的和中的部分,分别填充轨道中的 2 , 3 , 4 表项。 Processing the other 3 instructions of the other 3 The sub-blocks also operate independently on the respective instructions in the above manner, independently determine the address boundary of the instruction, and control the selectors 671, 672 via the control lines 611, 612, 613 according to the judgment. , 673 Select the contents of the storage table item 620-626, and fill in the 2, 3, and 4 items in the track, respectively, in response to the sum generated by the adder in the sub-block.
轨道中最后一个表项,即结束轨迹点,由选择器 674 的输出填充。 选择器由该指令段的基地址614中的块偏移量 306 直接控制。当块偏移量 306 为' 00 '时,选择器 674 选择存储表项 621 。当存储表项 621 中的有效位为'有效'时,选择器 674 输出存储表项 631 中的 BN1X 地址;当存储表项 621 中的内容中的有效位为'无效'时,选择器 674 输出存储表项 625 中存储的路组号,索引位 307 。该输出与加法器 607 产生的和中的块偏移量 306 加' 1 ',偏移量 303 ( BNY )合并后被送到轨道表 110 中一条轨道中的结束表项。当基地址中的块偏移量 306 为' 01 ',' 10 ',时,则选择器 674 相应选择存储表项 622 ,及 623 中的内容,如内容无效,则选表项 625 ,与上述相同。当基地址中的块偏移量 306 为' 11 '时,选择器 674 选择存储表项 626 读出其中的路组号,选择存储表项 625 读出其中的索引 307 。以表项 626 的下块路组号,表项 625 中的索引 307 加' 1 ',加法器 607 产生的和中的块偏移量 306 ,偏移量 303 合并成 BN2 地址存入上述轨道中的结束轨迹点表项。 The last entry in the track, the end track point, is filled by the output of selector 674. The selector is directly controlled by the block offset 306 in the base address 614 of the instruction segment. When the block offset 306 is '00', the selector 674 selects the storage entry 621. When storing the entry 621 When the valid bit in the field is 'valid', the selector 674 outputs the BN1X address in the storage entry 631; when the valid bit in the contents of the storage entry 621 is 'invalid', the selector 674 The path group number stored in the storage table entry 625 is output, index bit 307. The output is offset from the block offset 306 generated by adder 607 by '1', offset 303 (BNY) After the merge, it is sent to the end entry in a track in the track table 110. When the block offset 306 in the base address is '01', '10', the selector 674 selects the storage entry accordingly. For the contents of 622 and 623, if the content is invalid, the entry 625 is selected, which is the same as above. When the block offset 306 in the base address is '11', the selector 674 selects the storage entry 626 Read the path group number and select the storage entry 625 to read the index 307. With the lower block group number of entry 626, the index 307 in entry 625 is incremented by '1', the adder 607 The generated block offset 306, the offset 303 is merged into the end track point entry in which the BN2 address is stored in the above track.
在本实施例中,主动表 104 也可以采用多端口读写的方式构成,以实现多个分支目标地址同时对主动表的访问。 In this embodiment, the active list 104 It can also be configured by multi-port read/write to achieve simultaneous access to the active table by multiple branch target addresses.
图 7 是按全相联方式组织的微型轨道表中使用的存储器及格式。请看图 7A ,其为全相联的微型轨道块中一个存储器 820 的结构。存储器 820 中含有 6 个表项,对应一个含有 4 个一级指令块的二级指令块。其中表项 710 存有对应于二级指令块中块内位移为' 00 '的一级指令块的一级指令块号 BN1X 及其有效信号;表项 711 , 712 , 713 相应存有块内位移分别为' 01 ',' 10 ',' 11 '的一级指令块的一级指令块号。表项 714 存有本二级指令块的路组号( Way number )及索引地址 307 ( index )。表项 715 存有下块二级缓存块的路组号。 Figure 7 shows the memory and format used in the micro-track table organized in a fully associative manner. Please see Figure 7A , which is the structure of a memory 820 in a fully associative micro track block. The memory 820 contains six entries, corresponding to a secondary instruction block containing four primary instruction blocks. Wherein the entry 710 There is a first-order instruction block number BN1X corresponding to the first-order instruction block whose displacement in the block is '00' in the second-level instruction block and its valid signal; the entries 711, 712, and 713 respectively have the intra-block displacements respectively. 01 ', ' 10 ', '11 ' The first instruction block number of the level one instruction block. Entry 714 contains the way group number ( Way number ) and index address 307 ( Index ). Entry 715 stores the path group number of the lower level L2 cache block.
请看图 8 ,其为全相联的微型轨道表的一个实施例,其中模块 110 为轨道表,模块 808 为扫描器, 可代提图 5 中扫描器 108 。功能块 801 是类似图 6 实施例中指令译码和判断模块 601 的,对经过扫描器的一个指令块中的复数条指令进行独立指令译码及计算分支目标地址的功能块。功能块 801 将每条译码结果为分支指令的指令基地址( Base Address, 如图 6 中所述,该复数条指令的基地址的高位相同,但此例中基地址最低两位随指令在指令块内的位置有所不同)与该指令的分支位移( Branch Offset ,即分支地址偏移量)相加,其和即分支目标地址,以此地址控制对微型主动块 881 内容的选择。请见图 7B ,这些分支目标地址可以被划分为 4 部分,从高位往低位按减序排列,分别是微标签部分( Tag ) 721 ,微索引( Index ) 722 ,块偏移量( Block Offset ) 306 及偏移量 303 。微标签 721 ,微索引 722 与本公开的其他实施例中的标签 311 ,索引 307 不同。其中微索引 722 只有两位,因为每个微型主动块只含有 4 条与二级指令块对应的主动表行,对应一个二级指令块中有 4 个一级指令块,微索引值等于主动表索引 307 的最低两位。因此主动表索引 307 中的其他位被合并到微标签 721 中。地址是一样的,只是本实施例在地址的不同位上划分标签与索引。微标签 721 中含标签 311 及主动表索引 307 除最低两位之外的其他位。  Referring to Figure 8, which is an embodiment of a fully associative micro-track table, module 110 is a track table, module 808 For the scanner, the scanner 108 in Figure 5 can be referred to. The function block 801 is similar to the instruction decoding and judging module 601 in the embodiment of FIG. A function block for performing independent instruction decoding and calculating a branch target address for a plurality of instructions in an instruction block of the scanner. The function block 801 sets each decoding result as the instruction base address of the branch instruction (Base Address, as described in Figure 6, the high order of the base address of the complex instruction is the same, but in this case the lowest two bits of the base address differ from the position of the instruction in the instruction block) and the branch offset of the instruction ( Branch Offset, which is the branch address offset, is added, and the sum is the branch target address, and the selection of the content of the micro active block 881 is controlled by this address. Please see Figure 7B, these branch target addresses can be divided into 4 In part, the descending order is from the high position to the low position, which are a micro-tag portion ( Tag ) 721 , a micro-index ( Index ) 722 , and a block offset ( Block Offset ) 306 . And the offset 303. Micro-label 721, micro-index 722 is different from label 311, index 307 in other embodiments of the present disclosure. Where the micro index 722 There are only two digits, because each micro active block contains only 4 active table rows corresponding to the second level instruction block, and there are 4 level one instruction blocks in the corresponding level one instruction block, and the micro index value is equal to the active table index 307 The lowest two. Therefore, the other bits in the active table index 307 are merged into the microtag 721. The addresses are the same, except that this embodiment divides the labels and indexes on different bits of the address. Microlabel 721 contains labels 311 and active table index 307 bits other than the lowest two bits.
其中前 3 部分 721 , 722 与 306 经总线 810 , 811 , 812 , 813 送到各微型主动块(如微型主动块 881 , 883 )用以控制其中的选择器;偏移量 303 则是与对应选择器输出的 BNX 合并成完整的 BN 地址以填充轨道表 110 中表项。回到图 8 ,微型主动块 881 中包含存储轨道表表项的存储器 820 , 821 , 822 , 823 及多路选择器 870 , 871 , 872 , 873 , 874 。其中存储器 820 等存储器即为图 7A 中结构。  The first three parts 721, 722 and 306 are via buses 810, 811, 812, 813 is sent to each micro active block (such as micro active block 881, 883) to control the selector; offset 303 is combined with the corresponding selector output BNX into a complete BN The address is to populate the entries in the track table 110. Returning to Figure 8, the micro-active block 881 contains memories 820, 821, 822, 823 and multiplexers that store track table entries. 870, 871, 872, 873, 874. The memory such as memory 820 is the structure in Figure 7A.
微型主动块 881 中有一个微标签寄存器 851 ,其中存有与微型主动块 881 中所存的主动表表项对应的一段连续指令的基地址。另有 4 个比较器 860 , 861 , 862 , 863 。该 4 个比较器的一个输入与寄存器 851 的输出连接,另一个输入分别连接上述 4 个分支目标地址 810 , 811 , 812 , 813 。 4 个分支目标地址 810 , 811 , 812 , 813 被分别送到微型主动块 881 , 883 (与微型主动块 881 同样结构)与其中微标签寄存器中的微标签比较。在微型主动块 881 中,设分支目标地址 810 中微标签部分 721 经比较器 860 比较,与微寄存器 851 中的微标签相同。比较器 860 以分支目标地址 810 中微索引 307 及块偏移量 306 控制选择器 870 。 其中微索引 307 选择 4 个存储器中的一个,当微索引为' 00 '选择 820 ,当微索引为' 01 ',' 10 ',' 11 ',分别选择存储器 821 , 822 , 823 。而块偏移量 306 从被选中的存储器中选择其中 4 组 BN1X 地址及有效位中的一组;当该被选中的组中的有效位为'有效'时,选择器 870 输出该被选中组中的 BN1X 地址;当该被选中的组中的有效位为'无效'时,选择器 870 输出存储器 820 表项 724 中存储的路组号,索引位 307 ,连同分支目标地址 810 上的块偏移量 306 一同输出。该输出与来自另一微型主动块 883 的同一输出端经或门 840 或操作后,与来自加法器 607 的偏移量 303 合并送往轨道表 110 写入由地址总线 505 指向的轨道中的第一个表项。  The micro active block 881 has a microtag register 851 in which the micro active block 881 is stored. The base address of a consecutive instruction corresponding to the active table entry stored in it. There are also 4 comparators 860, 861, 862, 863. One input and register of the four comparators 851 The output is connected, and the other input is connected to the above four branch target addresses 810, 811, 812, and 813, respectively. 4 branch destination addresses 810, 811, 812, The 813 is sent to the micro active blocks 881, 883 (the same structure as the micro active block 881) and compared to the micro tags in the microtag registers. In the micro active block 881, the branch destination address is set. The 810 micro-label portion 721 is compared by the comparator 860 and is the same as the micro-tag in the micro-register 851. Comparator 860 with branch target address 810 micro index 307 and block offset 306 Control selector 870. The micro index 307 selects one of the four memories. When the micro index is '00', select 820, and when the micro index is '01', '10', ' 11 ', select memory 821, 822, 823 respectively. Block offset 306 selects 4 groups of BN1X from the selected memory. a set of addresses and valid bits; when the valid bit in the selected group is 'valid', the selector 870 outputs the BN1X address in the selected group; when the valid bit in the selected group is ' Invalid ', selector 870 Output Memory 820 The path group number stored in entry 724, index bit 307, along with the block offset on branch destination address 810 306 Output together. The output is coupled to the same output from another micro active block 883 via an OR gate 840 or operated, and combined with an offset 303 from adder 607 to the track table 110. The first entry in the track pointed to by address bus 505 is written.
在微型主动块 881 中,设分支目标地址 811 中微标签部分 721 经比较器 861 比较,其结果与微标签寄存器 851 中的微标签不同,此时比较器 861 送出信号控制选择器 871 输出全' 0 '输出,使其不影响其他微型主动块(如微型主动块 883 )中的相应输出。如分支目标地址 811 中的微标签 721 与所有微型主动块中存储的微标签比较都不匹配,则将分支目标 811 送往主动表 104 读取分支目标地址 811 指向的表项读取该表项填入轨道表中由地址总线 505 指向的轨道上的第 2 个表项。同理,其余 2 个分支目标指令地址 812 , 813 各自控制选择器 872 , 873 选择 16 个 BN1 中的 1 个;或者路组号及索引位 307 ,连同目标指令地址上的块偏移量 306 ;或者全' 0 '输出。该输出与相应的 BN1Y 合并后,与来自微型主动块 883 的相应输出进行或操作后,送往轨道表 110 写入上述轨道的 3 , 4 表项存储。如果一条指令不是分支指令,指令译码控制该指令相应的比较器不做比较,如指令 892 不是分支指令,分支目标地址 812 的有效位为'无效',因此各微型主动块 881 , 883 中的相应比较器 862 不做比较。指令译码产生的非分支指令类型被存入轨道表 110 中上述轨道的第 3 表项。  In the micro active block 881, the micro tag portion 721 in the branch target address 811 is set via the comparator 861. In comparison, the result is different from the micro-tag in the micro-tag register 851, at which time the comparator 861 sends a signal control selector 871 to output all '0'. 'Output so that it does not affect the corresponding output in other micro active blocks (such as micro active block 883). Such as micro-label 721 in branch destination address 811 If the micro-tags stored in all the micro-active blocks do not match, the branch target 811 is sent to the active table. 104 The branch target address is read. 811 The table entry is read. The entry is filled in the track table by the address bus. The second entry on the track pointed to by 505. Similarly, the remaining two branch target instruction addresses 812, 813 each control selectors 872, 873 select 16 BN1 1; or the way group number and index bit 307, together with the block offset 306 on the target instruction address; or all '0' output. The output is merged with the corresponding BN1Y, with the micro active block 883 After the corresponding output is performed or operated, it is sent to the track table 110 to write the 3, 4 entries of the above track storage. If an instruction is not a branch instruction, the instruction decode controls the corresponding comparator of the instruction without comparison, such as an instruction. 892 is not a branch instruction, the valid bit of branch destination address 812 is 'invalid', so the corresponding comparator 862 in each micro active block 881, 883 Do not compare. The non-branch instruction type generated by the instruction decoding is stored in the third entry of the above track in the track table 110.
存储到该轨道中结束轨迹点中的下块地址通过相似的比较,译码选择功能提供。选择器 874 与存储器 820 等的连接方式与选择器 870-873 的连接方式有所不同,在同样的地址控制下选择器 874 选择 870-873 的顺序下一个地址的输入。 如地址的微索引 722 及块偏移 306 位为' 0000 '时,选择器 870-873 选择存储器 820 中表项 710 ,但根据同样的地址选择器 874 选择存储器 820 中表项 711 ;如地址的微索引 722 及块偏移 306 为' 0011 '时,选择器 870-873 选择存储器 820 中表项 713 ,但选择器 874 选择存储器 821 中表项 710 。如地址的微索引及块偏移量 306 为' 1111 '时比较特殊,选择器 870-873 选择存储器 823 中表项 713 ,但选择器 874 选择存储器 823 中表项 715 中的路组号及表项 714 中的二级指令块号加' 1 ', 连同地址中的块偏移 306 一起作为下块地址。本块地址 814 (即正被处理的指令块的基地址)中的微标签 721 被送到各微型主动块中与其中存储的微标签比较。 设微型主动块 881 中比较器 864 比较本块地址 814 上的微微与微标签寄存器 851 的输出,其比较结果相同,则比较器 864 以本块地址 814 上的索引 722 及块偏移量 306 部分控制选择器 874 。 当选中的一级指令块地址 BN1X 其有效位为'有效'时,将表项输出。当选中的一级指令块地址 BN1X 其有效位为'无效'时,选择器 874 选择该存储器中 823 中表项 724 中的路组码,索引地址 307 与地址 814 上的块偏移量 306 一同输出。如果需要的下块地址不在各微型主动块中而在主动表 110 中,也按相似方式将其填入轨道的结束轨迹点。如此完成对整条轨道的填充。请看图 7C ,其为轨道表中的地址格式。其中地址格式 760 为一级缓存地址格式,由 BN1X 761 及偏移量 BNY 303 组成。其中地址格式 780 为二级缓存地址格式,由路组号 781 ,索引地址 307 ,块偏移量 306 ,偏移量 BNY 303 组成。  The lower block address stored in the end track point in the track is provided by a similar comparison by a decoding selection function. Selector 874 and memory The connection method of the 820, etc. is different from that of the selector 870-873. Under the same address control, the selector 874 selects the input of the next address in the order of 870-873. If the microindex 722 of the address and the block offset 306 bit are '0000', the selector 870-873 selects the entry 710 in the memory 820, but according to the same address selector 874 The entry 711 in the memory 820 is selected; if the micro index 722 of the address and the block offset 306 are '0011', the selector 870-873 selects the entry in the memory 820. 713, but the selector 874 selects the entry 710 in the memory 821. If the micro index of the address and the block offset 306 are '1111', it is special, the selector 870-873 The entry 713 in the memory 823 is selected, but the selector 874 selects the way group number in the entry 715 in the memory 823 and the second instruction block number in the entry 714 plus '1'. Together with the block offset 306 in the address as the lower block address. The micro-tag 721 in the block address 814 (i.e., the base address of the instruction block being processed) is sent to each micro-active block for comparison with the micro-tag stored therein. Let the comparator 864 in the micro active block 881 compare the output of the pico and microtag registers 851 on the block address 814, and the comparison result is the same, then the comparator 864 takes the block address 814. Index 722 and block offset 306 on the top control selector 874. When the selected first-level instruction block address BN1X has a valid bit of 'valid', the entry is output. When the selected level 1 block address When BN1X has its valid bit 'invalid', selector 874 selects the way group code in table entry 724 in the memory 823, index address 307 and block offset 306 on address 814. Output together. If the required lower block address is not in each micro active block but in the active list 110, it is also filled in the similar way to the end track point of the track. This completes the filling of the entire track. Please see Figure 7C , which is the address format in the track table. The address format 760 is a level 1 cache address format, which is composed of BN1X 761 and offset BNY 303. Where address format 780 It is a secondary cache address format consisting of way group number 781, index address 307, block offset 306, and offset BNY 303.
回到图 8 ,进一步地,当上述分支目标 810 , 811 , 812 , 813 及本块地址 814 中的微标签,在扫描器 800 中与中所有的微型主动块(如微型主动块 881 , 883 )等都未匹配,如本实施例中将分支目标地址 811 送往主动表 104 中读取表项以供填充轨道表 110 时,分支目标地址 811 指向的行可以被填充到由扫描器 800 中的置换逻辑(如 LRU , 图中未显示)指定的一个微型主动块(如微型主动块 883 )中的由分支目标 811 中的微索引位 722 指定的存储器,替换原有的表项,比如当微索引位为' 10 ',则替换微型主动块 883 中存储器 822 内容。其方式是将分支目标 811 指向的主动表 104 中一行中的 4 个 BN1X 及其有效信号依次填入表项 710 , 711 , 712 , 713 ;将该主动表行的路组号及索引号 307 填入表项 714 作为本块的二级缓存块号;将该主动表行中下块表项如 503 中的路组号填入表项 715 。另将分支目标 811 中微标签存入微型主动块 883 中的微标签寄存器 851 ;及将存储器 820 , 821 , 823 中的有效位置为'无效'。此后可以在主动表未被访问的周期将存储器 820 , 821 , 823 中的各表项更新。  Returning to Figure 8, further, when the above branch targets 810, 811, 812, 813 and the block address The micro-tag in 814 does not match all the micro-active blocks (such as micro-active blocks 881, 883) in the scanner 800, and the branch target address 811 is sent to the active table in this embodiment. When a table entry is read in 104 for padding the track table 110, the row pointed to by the branch target address 811 can be padded to the permutation logic (such as the LRU, by the scanner 800). The memory specified by the micro index bit 722 in the branch target 811 in a designated micro active block (e.g., micro active block 883) is replaced by the original entry, such as when the micro index bit is '10'. ', replaces the memory 822 content in the micro active block 883. The method is to fill in four BN1Xs and their valid signals in a row in the active table 104 pointed to by the branch target 811 into the entry. 710, 711, 712, 713; the path group number and index number 307 of the active table row are filled in the entry 714 as the L2 cache block number of the block; the lower block entry in the active table row is 503. The road group number in the middle is filled in the entry 715. In addition, the micro-tag in the branch target 811 is stored in the micro-tag register 851 in the micro-active block 883; and the memory 820, 821, 823 The valid position in is 'invalid'. Thereafter, each entry in the memory 820, 821, 823 can be updated during the period in which the active list is not accessed.
置换逻辑可根据特定算法指定一个微型主动块作为置换对象。以 LRU 为例,在每个微型主动块中存储有一个有复数位的计数值,其最低位在右方。每当块中任何一个比较器匹配时该计数值就左移 1 位, 并在最低位填充' 1 '。置换逻辑观察所有各块中的计数值,如有任意一个计数值的最低位为' 0 ',则以该计数值所在的微型主动块为被置换对象。如所有计数值都不为' 0 ', 则置换逻辑控制所有各微型主动块中的计数值都各右移一位,直到有一个计数值最低位为' 0 ',即以该计数值所在的微型主动块为被置换对象。  The permutation logic can specify a micro-active block as a permutation object according to a specific algorithm. LRU For example, in each micro active block, there is stored a count value with complex bits whose lowest bit is on the right. The count value is shifted 1 bit to the left whenever any of the comparators in the block match, and is filled with '1' at the lowest bit. '. The replacement logic observes the count value in all the blocks. If the lowest bit of any one of the count values is '0', the micro active block where the count value is located is the replaced object. If all count values are not '0', Then, the replacement logic controls the count values in all the micro active blocks to be shifted to the right by one bit until one of the lowest value of the count value is '0', that is, the micro active block where the count value is located is the replaced object.
本发明还可以用组相连结构的微型主动块支持对扫描器 108 正在扫描的一个指令块中的全部指令同时进行地址映射。 组相连结构的微型主动块其结构类似一个缩小的主动表 104 ,比如列数,表项一样但只有 8 行, 且有 4 个读口对应一个指令块中最多 4 条指令。每个读口对应轨道表 110 中一个表项。此外图 5 中选择器 521 , 531 ,比较器 420 ,三态门 541 等都是 4 套。 4 条分支指令的 4 个分支地址用于对组相连结构的微型主动块寻址。其中 4 个微索引(此例中为 3 位)分别从 2 路组的两个阵列中各 4 个读口读出 8 行内容,其中的 8 组 BN1X 地址各由 4 个分支地址中的块偏移量 306 从每组中选择一个; 8 个微标签(较标签 311 长,因包括索引 307 中除最低 3 位以外的位)在 8 个比较器中分别与 4 个分支地址中的微标签比较。一个读口两路中比较结果相同的一路驱动其 3 态门将该读口该路中上述由 306 选中的 BN1X 写入轨道表中该读口对应的表项。 4 个读口各写轨道中一个表项。  The present invention can also support the scanner 108 with a micro-active block of a group-connected structure. All instructions in one instruction block being scanned are simultaneously address mapped. The micro-active block of the group connected structure is similar in structure to a reduced active table 104, such as the number of columns, but the list is the same but only 8 rows, and there are 4 A read port corresponds to a maximum of 4 instructions in an instruction block. Each read corresponds to an entry in the track table 110. In addition, the selectors 521, 531, the comparator 420, and the three-state gate 541 in FIG. And so on are 4 sets. The four branch addresses of the four branch instructions are used to address the micro active blocks of the group connected structure. Four of the micro-indexes (three in this case) are from the two arrays of the 2-way set, respectively. The read port reads out 8 lines of contents, 8 of which are BN1X addresses each of which has a block offset of 4 branch addresses 306. One is selected from each group; 8 micro-labels (compared to label 311) Long, including the bits other than the lowest 3 bits in index 307) are compared with the micro-tags in the 4 branch addresses in 8 comparators. One of the two channels in the same way to compare the results of the same drive 3 The state gate writes the BN1X selected by the above 306 in the path of the read port to the entry corresponding to the read port in the track table. Each of the four read ports writes one entry in the track.
需要说明的是,本发明所述全部技术方案还可以被扩展到更多层次的缓存***中。 It should be noted that all the technical solutions described in the present invention can be extended to more levels of the cache system.
根据本发明技术方案和构思,还可以有其他任何合适的改动。对于本领域普通技术人员来说,所有这些替换、调整和改进都应属于本发明所附权利要求的保护范围。 There may be any other suitable modifications in accordance with the technical solutions and concepts of the present invention. All such substitutions, modifications and improvements are intended to be within the scope of the appended claims.
工业实用性Industrial applicability
本发明提出的装置和方法可以被用于各种与缓存相关的应用中,可以提高缓存的效率。  The apparatus and method proposed by the present invention can be used in various cache related applications, and the efficiency of the cache can be improved.
序列表自由内容Sequence table free content

Claims (20)

  1. 一种高性能指令缓存方法,其特征在于,处理器核连接一个包含可执行指令的第一存储器、一个比第一存储器速度更快的第二存储器;所述方法包括: A high performance instruction cache method, characterized in that a processor core is connected to a first memory containing executable instructions, and a second memory is faster than the first memory; the method comprises:
    对正被从第一存储器填充到第二存储器的指令进行审查,从而提取出至少包括分支信息的指令信息;Examining an instruction being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information;
    根据提取出的指令信息建立复数条轨道;Establishing a plurality of tracks according to the extracted instruction information;
    根据复数条指令轨道中的一条或多条轨道将至少一条或多条指令可能被处理器核执行的指令从第一存储器填充到第二存储器;Encapsulating at least one or more instructions that may be executed by the processor core from the first memory to the second memory according to one or more tracks in the plurality of instruction tracks;
    所述方法进一步包括,第二存储器采用全相联的方式构成,第一存储器由组相联的方式构成。The method further includes the second memory being constructed in a fully associative manner, the first memory being constructed in a group associated manner.
  2. 根据权利要求 1 所述方法,其特征在于,将轨道与第二存储器中的指令块一一对应。The method of claim 1 wherein the tracks are in one-to-one correspondence with the blocks of instructions in the second memory.
  3. 根据权利要求 1 所述方法,其特征在于,通过一级块号对目标地址进行寻址,从而确定该目标指令是否属于第二存储器的某个指令块。According to claim 1 The method is characterized in that the target address is addressed by a primary block number to determine whether the target instruction belongs to a certain instruction block of the second memory.
  4. 根据权利要求 1 所述方法,其特征在于,通过匹配,将二级块号写入轨道表,等到第一存储器中的指令填充到第二存储器中时,将其更改为一级块号。According to claim 1 The method is characterized in that, by matching, the secondary block number is written to the track table, and when the instruction in the first memory is filled into the second memory, it is changed to a primary block number.
  5. 根据权利要求 1 所述方法,其特征在于,对轨道进行扫描,一旦发现有对主动表块号的引用就将主动表对应块号的标志位置位;同时依次将主动表中各个块号的标志位复位,从而用已置位的标志位表示当前被轨道引用的块号,使之不会被替换出主动表。According to claim 1 The method is characterized in that the track is scanned, and once the reference to the active table block number is found, the flag position of the block number corresponding to the active table is set; and the flag bits of each block number in the active table are sequentially reset, thereby Use the flag that has been set to indicate the block number currently being referenced by the track so that it will not be replaced by the active table.
  6. 一种高性能指令缓存***,其特征在于,所述***包括:A high performance instruction cache system, the system comprising:
    处理器核,所述处理器核用以执行指令;a processor core, the processor core is configured to execute an instruction;
    第一存储器,所述第一存储器用以存储所述处理器核所需指令;a first memory, where the first memory is used to store instructions required by the processor core;
    第二存储器,所述第二存储器用以存储所述处理器核所需指令,且所述第二存储器的速度比所述第一存储器更快;a second memory, the second memory is configured to store instructions required by the processor core, and the second memory is faster than the first memory;
    扫描器,所述扫描器用以对正被从第一存储器填充到第二存储器的指令进行审查,从而提取出至少包括分支信息的指令信息;a scanner, configured to review an instruction being filled from the first memory to the second memory, thereby extracting instruction information including at least branch information;
    轨道表,所述轨道表用以存储根据提取出的指令信息建立的复数条轨道;a track table, the track table is configured to store a plurality of tracks established according to the extracted instruction information;
    所述***进一步包括:The system further includes:
    第二存储器采用全相联的方式构成;和The second memory is constructed in a fully associative manner; and
    第一存储器由组相联的方式构成。The first memory is constructed by a group association.
  7. 根据权利要求 6 所述***,其特征在于,将轨道表中的轨道与第二存储器中的指令块一一对应。The system of claim 6 wherein the tracks in the track table are in one-to-one correspondence with the blocks of instructions in the second memory.
  8. 根据权利要求 6 所述***,其特征在于,第二存储器中的每个指令块对应一个一级块号。The system of claim 6 wherein each of the blocks of instructions in the second memory corresponds to a level one block number.
  9. 根据权利要求 6 所述***,其特征在于,对轨道表进行扫描,一旦发现有对主动表块号的引用就将主动表对应块号的标志位置位;同时依次将主动表中各个块号的标志位复位,从而用已置位的标志位表示当前被轨道表引用的块号,使之不会被替换出主动表。According to claim 6 The system is characterized in that the track table is scanned, and once the reference to the active table block number is found, the flag position of the block number corresponding to the active table is set; and the flag bits of each block number in the active table are sequentially reset. Thus, the set flag bit is used to indicate the block number currently referenced by the track table so that it is not replaced by the active table.
  10. 根据权利要求 1 所述方法,其特征在于,当第一存储器中一个指令块对应的顺序地址的前一指令块或后一指令块已经存储在第一存储器中时,主动表中存储了该指令块对应的所述前一指令块或后一指令块在第一存储器中的存储位置信息。According to claim 1 The method is characterized in that, when a previous instruction block or a subsequent instruction block of a sequential address corresponding to an instruction block in the first memory is already stored in the first memory, the corresponding table of the instruction block is stored in the active table. The storage location information of the previous instruction block or the subsequent instruction block in the first memory.
  11. 根据权利要求 10 所述方法,其特征在于,当指令位于第一存储器中当前指令块的前一指令块或后一指令块中时,可以根据存储在主动表中的所述前一指令块或后一指令块的位置信息,直接在第一存储器中找到该指令。According to claim 10 The method is characterized in that, when the instruction is located in a previous instruction block or a subsequent instruction block of the current instruction block in the first memory, the instruction may be according to the previous instruction block or the subsequent instruction block stored in the active table. The location information is found directly in the first memory.
  12. 根据权利要求 10 所述方法,其特征在于,对分支目标指令地址进行边界判断;根据所述判断结果,对位于不同位置的分支目标指令给予不同格式的地址。According to claim 10 The method is characterized in that a boundary determination is performed on a branch target instruction address; and a branch target instruction located at a different position is given an address in a different format according to the determination result.
  13. 根据权利要求 12 所述方法,其特征在于,若分支目标指令地址位于分支指令在第一存储器中所在指令块的前一或后一指令块中,则以该分支指令所在指令块的前一或后一指令块的二级块号作为该分支目标指令的二级块号,以该分支目标指令地址中对应第一存储器的地址偏移量部分作为该分支目标指令的偏移量。According to claim 12 The method is characterized in that, if the branch target instruction address is located in a previous or a subsequent instruction block of the instruction block in which the branch instruction is located in the first memory, the previous or next instruction block of the instruction block in which the branch instruction is located is The secondary block number is used as the secondary block number of the branch target instruction, and the address offset portion corresponding to the first memory in the branch target instruction address is used as the offset of the branch target instruction.
  14. 根据权利要求 1 、 10 所述方法,其特征在于,将正被从第一存储器填充到第二存储器的指令对应的主动表内容存储在微型主动表中;若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、且该一级指令块在微型主动表中对应的一级块号有效时,直接以从微型主动表中读出的所述一级块号作为所述分支目标指令的一级块号;若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、但该一级指令块在微型主动表中对应的一级块号无效时,直接以所述分支指令的二级块号作为所述分支目标指令的二级块号;若审查发现分支目标指令位于分支指令的前一个或后一个二级指令块、且该前一个或后一个二级指令块在微型主动表中对应的二级块号有效时,直接以从微型主动表中读出的所述二级块号作为所述分支目标指令的二级块号。According to claims 1 and 10 The method is characterized in that: the active table content corresponding to the instruction being filled from the first memory to the second memory is stored in the micro active table; if the review finds that the branch target instruction is located in the same second instruction block of the branch instruction When the first-level instruction block is different, and the first-level instruction block is valid in the corresponding primary block number in the micro-active table, the first-level block number read out from the micro-active table is directly used as the branch target instruction. The first-level block number; if the review finds that the branch target instruction is located in a different level one instruction block in the same two-level instruction block of the branch instruction, but the first-level instruction block is invalid in the corresponding first-level block number in the micro-active table, Directly using the secondary block number of the branch instruction as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located in the previous or next secondary instruction block of the branch instruction, and the previous or the latter one When the corresponding secondary block number in the micro active table is valid, the secondary instruction block directly uses the secondary block number read from the micro active table as the secondary block of the branch target instruction. number.
  15. 根据权利要求 1 、 10 所述方法,其特征在于,将复数个二级块号及这些块号在主动表中对应的内容存储在微型主动表中;在若审查发现分支目标指令时,首先将分支目标指令地址在所述微型主动表中匹配,若匹配成功,则直接以从微型主动表中读出的一级块号或二级块号作为所述分支目标指令的一级块号或二级块号;若匹配不成功,则再将分支目标指令地址送往主动表匹配。According to claims 1 and 10 The method is characterized in that a plurality of secondary block numbers and corresponding contents of the block numbers in the active table are stored in the micro active table; if the branch target instruction is found by the review, the branch target instruction address is first placed in the Matching in the micro active table, if the matching is successful, directly use the primary block number or the secondary block number read from the micro active table as the primary block number or the secondary block number of the branch target instruction; If it is unsuccessful, the branch target instruction address is sent to the active table match.
  16. 根据权利要求 6 所述***,其特征在于,主动表的表项与第一存储器中的指令块一一对应,每个表项存储了第一存储器中相应指令块的块地址;且当第一存储器中一个指令块对应的顺序地址的前一指令块或后一指令块已经存储在第一存储器中时,主动表中还存储了该指令块对应的所述前一指令块或后一指令块在第一存储器中的存储位置信息。According to claim 6 The system is characterized in that: the entries of the active table are in one-to-one correspondence with the instruction blocks in the first memory, each entry stores a block address of a corresponding instruction block in the first memory; and an instruction in the first memory When the previous instruction block or the subsequent instruction block of the sequential address corresponding to the block is already stored in the first memory, the active table further stores the previous instruction block or the subsequent instruction block corresponding to the instruction block in the first memory. Storage location information in .
  17. 根据权利要求 16 所述方法,其特征在于,对分支目标指令地址进行边界判断;根据所述判断结果,对位于不同位置的分支目标指令给予不同格式的地址。According to claim 16 The method is characterized in that a boundary determination is performed on a branch target instruction address; and a branch target instruction located at a different position is given an address in a different format according to the determination result.
  18. 根据权利要求 17 所述***,其特征在于,所述***包含单数个或复数个加法器;所述加法器用于对分支指令本身在第一存储器对应的偏移量以外部分中的低位与分支转移距离中的相应位相加,判断所述分支目标指令是否位于第一存储器中所述分支指令所在的指令块顺序地址的前一个或后一个指令块中;当分支目标指令位于第一存储器中当前指令块的前一指令块或后一指令块中时,可以根据存储在主动表中的所述前一指令块或后一指令块的位置信息,直接在第一存储器中找到该指令。According to claim 17 The system, characterized in that the system comprises a single number or a plurality of adders; the adder is adapted to correspond to a branch instruction itself in a lower part of the portion other than the offset corresponding to the first memory Adding a bit to determine whether the branch target instruction is located in a previous or a subsequent instruction block of the instruction block sequential address where the branch instruction is located in the first memory; when the branch target instruction is located in the first memory before the current instruction block When in an instruction block or a subsequent instruction block, the instruction can be found directly in the first memory according to the location information of the previous instruction block or the subsequent instruction block stored in the active table.
  19. 根据权利要求 6 、 16 所述***,其特征在于,所述***还包括微型主动表;所述微型主动表用于存储正被从第一存储器填充到第二存储器的指令对应的主动表内容;当扫描器若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、且该一级指令块在微型主动表中对应的一级块号有效时,直接以从微型主动表中读出的所述一级块号作为所述分支目标指令的一级块号;若审查发现分支目标指令位于分支指令的同一个二级指令块中的不同一级指令块、但该一级指令块在微型主动表中对应的一级块号无效时,直接以所述分支指令的二级块号作为所述分支目标指令的二级块号;若审查发现分支目标指令位于分支指令的前一个或后一个二级指令块、且该前一个或后一个二级指令块在微型主动表中对应的二级块号有效时,直接以从微型主动表中读出的所述二级块号作为所述分支目标指令的二级块号。According to claims 6 and 16 The system, wherein the system further comprises a micro active table; the micro active table is configured to store active table content corresponding to an instruction being filled from the first memory to the second memory; The branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, and the first level instruction block is directly read from the micro active table when the corresponding first level block number in the micro active table is valid. The first block number is used as the first block number of the branch target instruction; if the review finds that the branch target instruction is located in a different level one instruction block in the same level two instruction block of the branch instruction, but the level one instruction block is When the corresponding primary block number in the micro active table is invalid, the secondary block number of the branch instruction is directly used as the secondary block number of the branch target instruction; if the review finds that the branch target instruction is located before or after the branch instruction a secondary instruction block, and the previous or next secondary instruction block is directly read from the micro active table when the corresponding secondary block number in the micro active table is valid The secondary block number is used as the secondary block number of the branch target instruction.
  20. 根据权利要求 6 、 16 所述方法,其特征在于,所述***还包括微型主动表;所述微型主动表用于存储复数个二级块号及这些块号在主动表中对应的内容;当扫描器若审查发现分支目标指令时,首先将分支目标指令地址在所述微型主动表中匹配,若匹配成功,则直接以从微型主动表中读出的一级块号或二级块号作为所述分支目标指令的一级块号或二级块号;若匹配不成功,则再将分支目标指令地址送往主动表匹配。According to claims 6 and 16 The method is characterized in that the system further comprises a micro active table; the micro active table is configured to store a plurality of secondary block numbers and corresponding contents of the block numbers in the active table; In the target instruction, the branch target instruction address is first matched in the micro active table, and if the matching is successful, the first block number or the second block number read out from the micro active table is directly used as the branch target instruction. The first block number or the second block number; if the match is unsuccessful, the branch target command address is sent to the active table match.
PCT/CN2014/085063 2013-02-08 2014-08-22 System and method for caching high-performance instruction WO2015024532A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/913,837 US20160217079A1 (en) 2013-02-08 2014-08-22 High-Performance Instruction Cache System and Method
US15/722,814 US10275358B2 (en) 2013-02-08 2017-10-02 High-performance instruction cache system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310379657.9 2013-08-23
CN201310379657 2013-08-23

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US14/766,754 Continuation US20150378935A1 (en) 2013-02-08 2014-01-29 Storage table replacement method
PCT/CN2014/071812 Continuation WO2014121740A1 (en) 2013-02-08 2014-01-29 Storage table replacement method

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US14/913,837 A-371-Of-International US20160217079A1 (en) 2013-02-08 2014-08-22 High-Performance Instruction Cache System and Method
US15/722,814 Continuation US10275358B2 (en) 2013-02-08 2017-10-02 High-performance instruction cache system and method

Publications (2)

Publication Number Publication Date
WO2015024532A1 WO2015024532A1 (en) 2015-02-26
WO2015024532A9 true WO2015024532A9 (en) 2015-04-23

Family

ID=52483095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/085063 WO2015024532A1 (en) 2013-02-08 2014-08-22 System and method for caching high-performance instruction

Country Status (2)

Country Link
CN (1) CN104424132B (en)
WO (1) WO2015024532A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294352B (en) * 2015-05-13 2019-10-25 姚猛 A kind of document handling method, device and file system
US9431070B1 (en) 2015-08-31 2016-08-30 National Tsing Hua University Memory apparatus
CN105389270B (en) * 2015-12-22 2019-01-25 上海爱信诺航芯电子科技有限公司 A kind of system and method improving system on chip instruction cache hits rate
CN112905528A (en) * 2021-02-09 2021-06-04 深圳市众芯诺科技有限公司 Intelligent household chip based on Internet of things

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897655A (en) * 1996-12-10 1999-04-27 International Business Machines Corporation System and method for cache replacement within a cache set based on valid, modified or least recently used status in order of preference
CN101552032B (en) * 2008-12-12 2012-01-18 深圳市晶凯电子技术有限公司 Method and device for constructing a high-speed solid state memory disc by using higher-capacity DRAM to join in flash memory medium management
US8904156B2 (en) * 2009-10-14 2014-12-02 Oracle America, Inc. Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor
US8527707B2 (en) * 2009-12-25 2013-09-03 Shanghai Xin Hao Micro Electronics Co. Ltd. High-performance cache system and method
US8635408B2 (en) * 2011-01-04 2014-01-21 International Business Machines Corporation Controlling power of a cache based on predicting the instruction cache way for high power applications
US8756405B2 (en) * 2011-05-09 2014-06-17 Freescale Semiconductor, Inc. Selective routing of local memory accesses and device thereof
CN102841865B (en) * 2011-06-24 2016-02-10 上海芯豪微电子有限公司 High-performance cache system and method

Also Published As

Publication number Publication date
WO2015024532A1 (en) 2015-02-26
CN104424132B (en) 2019-12-13
CN104424132A (en) 2015-03-18

Similar Documents

Publication Publication Date Title
US6678815B1 (en) Apparatus and method for reducing power consumption due to cache and TLB accesses in a processor front-end
JP4437001B2 (en) Translation index buffer flush filter
EP3298493B1 (en) Method and apparatus for cache tag compression
WO2014000624A1 (en) High-performance instruction cache system and method
JPH04232551A (en) Method and apparatus for converting multiple virtaul addresses
JPH07200399A (en) Microprocessor and method for access to memory in microprocessor
WO2013000400A1 (en) Branch processing method and system
WO2015024532A1 (en) System and method for caching high-performance instruction
JP2006172499A (en) Address converter
JPH03141443A (en) Data storing method and multi-way set associative cash memory
JP2004062280A (en) Semiconductor integrated circuit
WO2014121737A1 (en) Instruction processing system and method
US10275358B2 (en) High-performance instruction cache system and method
JP3449487B2 (en) Conversion index buffer mechanism
TW201638774A (en) A system and method based on instruction and data serving
WO2007099598A1 (en) Processor having prefetch function
WO2013071868A1 (en) Low-miss-rate and low-miss-penalty cache system and method
WO2015070771A1 (en) Data caching system and method
US5467460A (en) M&A for minimizing data transfer to main memory from a writeback cache during a cache miss
WO2019022875A1 (en) Precise invalidation of virtually tagged caches
WO2018199646A1 (en) Memory device accessed on basis of data locality and electronic system including same
WO2014000626A1 (en) High-performance data cache system and method
JP2002215457A (en) Memory system
WO2016169518A1 (en) Instruction and data push-based processor system and method
US10140217B1 (en) Link consistency in a hierarchical TLB with concurrent table walks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14837748

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 14913837

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 14837748

Country of ref document: EP

Kind code of ref document: A1