CN1560736A - Device and method for store-induced instruction coherency - Google Patents

Device and method for store-induced instruction coherency Download PDF

Info

Publication number
CN1560736A
CN1560736A CNA2004100560993A CN200410056099A CN1560736A CN 1560736 A CN1560736 A CN 1560736A CN A2004100560993 A CNA2004100560993 A CN A2004100560993A CN 200410056099 A CN200410056099 A CN 200410056099A CN 1560736 A CN1560736 A CN 1560736A
Authority
CN
China
Prior art keywords
memory page
instruction
logic device
positions
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2004100560993A
Other languages
Chinese (zh)
Inventor
罗德尼・E・胡克
罗德尼·E·胡克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INTELLIGENCE FIRST CO
IP First LLC
Original Assignee
INTELLIGENCE FIRST CO
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INTELLIGENCE FIRST CO filed Critical INTELLIGENCE FIRST CO
Publication of CN1560736A publication Critical patent/CN1560736A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3812Instruction prefetching with instruction modification, e.g. store into instruction stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

An apparatus and method in a pipeline microprocessor are provided, for ensuring coherency of instructions within stages of the pipeline microprocessor. The apparatus includes instruction cache management logic and synchronization logic. The instruction cache management logic receives an address corresponding to a next instruction, and detects that a part of a memory page corresponding to the next instruction cannot be freely accessed without checking for coherency of the instructions within the part of the memory page and, upon detection, provides the address. The synchronization logic receives the address from the instruction cache management logic. The synchronization logic directs data cache management logic to check for coherency of the instructions within the part of the memory page, and, if the instructions are not coherent within the part of the memory page, the synchronization logic directs the pipeline microprocessor to stall a fetch of the next instruction until the stages of the pipeline.

Description

Store the device and method that causes the instruction same tone
Technical field
The present invention relates to microelectronic field, relate in particular to a kind of technology, it is applied to a pipeline microprocessor, to guarantee the same tone in the instruction of its pipeline stage.
Background technology
Early stage microprocessor Design becomes once to carry out a programmed instruction.Therefore, early stage microprocessor can extract one first programmed instruction and execution from the internal memory of program.After carrying out this first instruction, this early stage microprocessor can extract one second programmed instruction and carry out from the internal memory of program.Then, one the 3rd programmed instruction can be extracted and carry out, and the rest may be inferred.Owing to have only an instruction to be performed at a particular point in time, application programming teacher can design the program that comprises self-correcting code section.
In its simplest form, self-correcting code is can the self-program code of revising.In above-mentioned example, if the execution result of this first programmed instruction is stored in a core position by this early stage microprocessor, and this core position be the position of the 3rd programmed instruction institute desire extraction, and then self-correcting code can exist.The reason of early stage microprocessor applications oneself correcting code is as follows: save the calling of memory headroom, execution subroutine and the instruction returning, carry out in the hiden application program with prevent to encroach on literary property (or in the dark starting virus), as the interface of special hardware or promote overall efficiency.The technology of oneself's correcting code also is out of favour, because common indigestion of the program code of this class and maintenance now at that time and be out of favour.
Although above-mentioned defective is arranged, self-correcting code is still using because of above-mentioned reason.But, early stage microprocessor but is not subjected to negative impact because need to support to carry out self-correcting code, now, microprocessor is but because need extra logic device unit, and being subjected to great impact, this added logic device unit is used to guarantee that the programmed instruction self-correcting code section occurring is able to suitably be carried out.This is because microprocessor is not now only carried out an instruction again, and can carry out several instructions simultaneously.A plurality of instructions are carried out in continuous section simultaneously, in other words, carry out in the pipeline stage of microprocessor now one, Hennessy and Patterson definition pipeline (pipeline) are " a kind of computing; wherein a plurality of instructions overlap in operation mutually "~Computer Architecture:AQuantitative Approach, second edition, by John L.Hennessy and David A.Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996.Hennessy and Patterson and being described below for pipeline is provided: pipeline is visual to be both a kind of assembly line.Online in an automobile assembling, many steps are arranged, each step is all to the contribution that is equipped with of automobile, each step all with the parallel running of other step, though it acts on the different automobiles.In the pipeline of computing machine, each step in the pipeline is finished the part of an instruction.As assembly line, different steps is all finished the different piece of different instruction abreast.Each step in the pipeline is referred to as pipeline stage (pipestage) or line section (pipe segment).Then one and form a pipeline of one of these stage.Instruction enters from an end of pipeline, via these stages, and goes out from the other end, just looks like that automobile is online the same in assembling.
Therefore, in microprocessor now, instruction is sent into from an end of pipeline, and advances and pass through a series of pipeline stage up to finishing execution, and it operates as assembly line.When its each pipeline stage was carried out different programmed instruction, pipeline microprocessor now operated with spike efficient.Finish its computing because wait for some pipeline stage, and cause extracting the delay of instruction or execution command, can cause the reduction of pipeline efficient (that is throughput).But, need support many multi-form computings at microprocessor now, therefore often cause the delay of pipeline.For example, it is quite slow computing that internal memory is read and write computing, and often needs to delay pipeline.Program flow conditionality ground changes (that is opportunistic branch operation (branch operation)) and often causes pipeline to be delayed.And self-correcting code appears in the pipeline, then be program flow needs interrupted situation in another microprocessor pipeline now.
Consider above-mentioned example, the execution result of this first programmed instruction is stored in the core position corresponding to the 3rd programmed instruction.When the storage computing of being dominated by this first programmed instruction takes place, if the 3rd programmed instruction extracts from internal memory and finishes in a leading portion pipeline stage computing, then the 3rd programmed instruction in this leading portion pipeline stage computing is not the computing that application programming teacher is intended to carry out.This be because when the position of the 3rd programmed instruction internal memory by access when extracting the 3rd programmed instruction, the instruction of this application programming Shi Yitu execution is not stored in this internal memory as yet.This example is described the existence because of self-correcting code, be the problem of the necessary processing of microprocessor institute now and cause the different situations of transferring of instruction in the pipeline, this situation.But microprocessor now can't not show whether carrying out self-correcting code.Therefore, microprocessor now must provide and extend logic device unit to check the same tone of each instruction in the pipeline, and this inspection is carried out or carried out for the storage computing of the just instruction in pipeline for the storage computing that each waits execution.
Therefore, need the device in the pipeline microprocessor, this device is in order to check the same tone of pipeline middle finger order, and this installs comparatively simple and needs less logic device unit.
Summary of the invention
The present invention is used to solve the above problems and other problem, defective and the restriction of known technology.The invention provides a kind of superior technology, be used for by waiting the situation that the storage computing carrying out or be performed is simultaneously caused, to guarantee the same tone of the instruction in the microprocessor pipeline now.
To achieve these goals, the invention provides the device in a kind of pipeline microprocessor, be used for guaranteeing the same tone that stage of this pipeline microprocessor instructs, this device comprises: an instruction cache management logic device, configuration become to receive the address corresponding to next instruction, and detecting could freely do access corresponding to the some of a memory page of this next instruction, and need not check the same tone that instructs in this part of this memory page, and, provide this address if must check same tone the time; And synchronous logic device, configuration becomes to receive this address from this instruction cache management logic device, and order a data cache management logic device to check the same tone that instructs in this part of this memory page, and if this instruction and this part of this memory page people having the same aspiration and interest not, the then extraction of this pipeline microprocessor delay next instruction of this synchronous logic device cell command is up to all previous instructions of this stage executed of this pipeline microprocessor.
The present invention provides the device in a kind of pipeline microprocessor again, be used for guaranteeing the same tone that stage of this pipeline microprocessor instructs, this device comprises: a data cache management logic device unit, configuration become to receive an address of corresponding one unsettled save command, and detecting could freely be done access corresponding to the some of a memory page of this save command, and need not check the same tone that instructs in this part of this memory page, and, provide this address if must check same tone the time; And synchronous logic device, configuration becomes to receive this address from this data cache management logic device, and order an instruction cache management logic device to check the same tone that instructs in this part of this memory page, and if this instruction and this part of this memory page people having the same aspiration and interest not, then this this pipeline microprocessor of synchronous logic device order is removed the previous stage of this pipeline microprocessor.
The present invention also provides a kind of method of pipeline microprocessor, be used for guaranteeing the same tone that stage of this pipeline microprocessor instructs, this method comprises the following steps: in a data cache, detecting can freely be done access corresponding to the part of a memory page of a unsettled save command, and need not check the same tone that instructs in this part of this memory page; Order the logic device in the instruction cache to check the same tone that instructs in this part of this memory page; If this instructs the not people having the same aspiration and interest, then remove the previous stage of this pipeline microprocessor.
That is to say, in one embodiment, the invention provides the device in the pipeline microprocessor, this device is used for guaranteeing the same tone that instructs in each stage of pipeline microprocessor.This device comprises instruction cache management logic device and synchronous logic device.This instruction cache management logic device receives an address corresponding to next instruction, and carry out a detection, this detection was determined before the same tone of the instruction of the some of not checking the page (memory page) corresponding to this next instruction, this part of this page can't be freely by access, and after finishing this detection, send this address.This synchronous logic device receives this address from instruction cache management logic device unit.This synchronous logic device unit director data logic device is checked the same tone of the instruction in this part of this page, and, if the people having the same aspiration and interest not, this synchronous logic device instructs this pipeline microprocessor to delay the extraction of next instruction, up to intact its all the previous instructions of these phase process of this pipeline microprocessor.
The present invention also can be regarded as a device in pipeline microprocessor, the same tone of this device in order to guarantee to instruct in the stage of pipeline microprocessor.This device comprises data cache management logic device and synchronous logic device.This data cache management logic device receives an address corresponding to the pending save command of first-class, and carry out a detection, this detection was determined before the same tone of not checking corresponding to the instruction of the some of the page of this next one save command, this part of this page can't be freely by access, and after finishing this detection, send this address.This synchronous logic device receives this address from this data cache management logic device, and director data logic device is checked the same tone of the instruction in this part of this page, and, if the people having the same aspiration and interest not, this synchronous logic device instruct this pipeline microprocessor to remove the previous stage of (flush) this pipeline microprocessor.
The present invention also comprises a kind of method of pipeline microprocessor, the same tone of this method in order to instruct in the stage of guaranteeing pipeline microprocessor.This method is included in carries out a detection in the data cache, this detection determined that before the same tone of the instruction of the some of the page of not checking the save command that waits execution corresponding to this next one this part of this page can't be freely by access; Order logic device in the instruction cache to check the same tone of the instruction in the page of this part; If this instructs the not people having the same aspiration and interest, then remove the previous stage of this pipeline microprocessor.
Content of the present invention, feature and advantage can become apparent by following narration and accompanying drawing.
Description of drawings
By Fig. 1 is the block scheme of known technology, describes an example stage of pipeline microprocessor now.
Fig. 2 is a block scheme, describes now a technology, this technology be used for detecting microprocessor pipeline by etc. the not people having the same aspiration and interest that caused of storage computing pending or that carrying out.
Fig. 3 is a block scheme, describes according to of the present invention one to be used for guaranteeing the microprocessor of pipeline instruction same tone when waiting storage computing pending or that carrying out to take place.
Fig. 4 is a block scheme, is described in the instruction cache in the microprocessor of Fig. 3 and the mutual relationship of data cache.
Fig. 5 is for describing the block scheme of cache management logic device of the present invention.
Fig. 6 is a form, describes the synchronization action that the present invention responds a unsettled storage computing, with the same tone of guaranteeing to instruct in the pipeline.
Fig. 7 is a form, describes the synchronization action before extracting next instruction among the present invention, with the same tone of guaranteeing to instruct in the pipeline.
Fig. 8 is a block scheme, describes another embodiment of cache management logic device of the present invention.
Fig. 9 is a form, describes the present invention and uses the cache management logic device of Fig. 8 to respond the synchronization action of a unsettled storage computing, with another embodiment of the same tone guaranteeing to instruct in the pipeline.
Figure 10 is a form, describes the synchronization action of cache management logic device before extracting next instruction that the present invention uses Fig. 8, with another embodiment of same tone that guarantees to instruct in the pipeline.
Wherein, description of reference numerals is as follows:
100 microprocessors, 102 extraction logic devices
104 translation logic devices 106 are deposited the logic device
108 addressing logic devices, 110 load logic devices
112 actuating logic devices, 114 stored logic devices
116 write back logic device 118 instruction caches
120 data caches, 122 Bus Interface Units
124 extract bus 126 load bus
128 memory bus, 130,132 buses
134 rambus, 200 block schemes
202 extract phase logic device 204 storage stage logic devices
206 synchronous logic devices, 208 unsettled storage target detecting logic devices
210 store 216 instruction pointer registers of inspection logic device
218 little arithmetic register 220 target address register
222 stall signals 226 store destination register
228 instruction pointer registers, 300 microprocessors
302 extraction logic devices, 304 translation logic devices
306 deposit logic device 308 addressing logic devices
310 load logic devices, 312 actuating logic devices
314 stored logic devices 316 write back the logic device
318 instruction caches, 320 data caches
322 Bus Interface Units 324 extract bus
326 load bus, 328 memory bus
330,332 buses, 334 rambus
336 instruction translation lookaside buffer, 339 clear signals
340 synchronous logic devices, 341 stall signals
342 buses, 344 data translation lookaside buffer
400 block schemes, 402 next instruction pointers
404 instruction caches, 406 instruction cache management logic devices
408 instruction translation lookaside buffer (ITLB)
The 410TLB inlet
412, the entitlement field of 414,416,418 part memory pages
420 bus 422DSNOOP buses
424 synchronous logic devices, 428 little arithmetic fields
430 related objective address fields, 432 data caches
434 data cache management logic device 436ISNOOP buses
438 data translation lookaside buffer (DTLB)
The 440TLB inlet
442, the entitlement field of 444,446,448 part memory pages
450 buses, 500 cache management logic devices
502TLB access logic device 504 watchdog logic devices
506 grasp logic device unit 510 buses
512 deposit out bus 514 deposits bus in
516 monitor bus, 800 cache management logic devices
801TLB inlet field 802DTLB access logic device
803 dirty positions (dirty bit)
Embodiment
Because the in question pipeline instruction and the relevant technology of microprocessor now of being used in of above background technology, with the same tone of guaranteeing to instruct in the pipeline stage when self-correcting code occurring, its correlation technique example please refer to Fig. 1 and Fig. 2.These examples particularly point out and narrate when using this known technology in pipeline microprocessor now, when guaranteeing to store the same tone that causes instruction, and the defective of this known technology and restriction.Then, the present invention is discussed, points out sample attitude of the present invention, feature and advantage, be better than in pipeline microprocessor now, being used to guaranteeing instructing the known technology of same tone with reference to Fig. 3-Figure 10.The present invention reaches and guarantees the function of instructing same tone required, but less resource and the power of its Technology Need more existing than computing.
See also Fig. 1, it is a block scheme, is described in example stage in the pipeline microprocessor 100 now.This microprocessor 100 has an extraction logic device 102, is coupled to a translation logic device 104.This translation logic device 104 is coupled to deposits logic device 106.This is deposited logic device 106 and is coupled to addressing logic device 108.This addressing logic device 106 is coupled to load logic device 110.This load logic device 110 is coupled to actuating logic device 112.This actuating logic device 112 is coupled to stored logic device 114.This stored logic device 114 is coupled to and writes back logic device 116.This microprocessor 100 has an instruction cache 118, and this instruction cache 118 extracts bus (fetch bus) 124 via one and is coupled to extraction logic device 102.In addition, this microprocessor 100 also has a data cache 120, and this data cache 120 is coupled to this load logic device 110 via a load bus 126, and is coupled to this stored logic device 114 via a memory bus 128.This instruction cache 118 and this data cache 120 are coupled to a Bus Interface Unit 122 via bus 130 and 132 respectively.This Bus Interface Unit 122 is connected to Installed System Memory (not being shown in the icon) via a rambus 134.
In computing, this extraction logic device 102 extracts macro instruction and carries out for microprocessor 100 from Installed System Memory.This instruction cache 118 is a kind of very fast chip internal memories (on-chip memory), and its function is that the internal memory that predicts which field will be done the access of extracting instruction, and be about to earlier that this field is in store so that this extraction logic device 102 can be done access apace in this instruction cache 118, and need not do access via Bus Interface Unit 122 and rambus 134.If resident and effectively in this instruction cache 118, then claim this next instruction pointer through once " cache hit " corresponding to the internal memory field of next instruction pointer.When the internal memory field does not reside in the high-speed cache 118, and must extract next instructions with rambus 134 time, can produce " high-speed cache not in ".The macro instruction that is extracted offers translation logic device 104, and it is translated into relevant microinstruction sequence (being also referred to as native instructions native instruction) with macro instruction and carries out for the follow-up phase 106,108,110,112,114,116 of microprocessor 100.Each micro-order of being somebody's turn to do in the relevant microinstruction sequence is carried out subsequent operation, the framework computing that is extracted macro instruction institute standard of reaching a correspondence in order to order the logic device computings in the follow-up pipeline stage 106,108,110,112,114,116.Micro-order after translating provides to depositing logic device 106.Be used for register (not icon) access of operation code (operand) from depositing logic device 106 of the inferior computing of follow-up phase 108,110,112,114,116.These addressing logic device 108 employed operation codes are from the virtual address of depositing logic device 106 or a free core position that micro-order produces that is provided being provided, so required operation code can be deposited logic device 106 or this virtual address access from this in these special computing.The address that is used for data acquisition time computing that produces provides to load logic device 110, and this load logic device 110 captures operation code via this data cache 120 from internal memory.As instruction cache 118, this data cache 120 also is a kind of internal memory of chip very fast, its function is that the internal memory that predicts which field will be done access, be in store and be about to this field earlier, so that load logic device 110 can be apace via data cache 120 access operation codes, and must not do access via Bus Interface Unit 122 and rambus 134.Similarly, be resident and effective in data cache 120 as if memory field corresponding to operation code load address, then be called data " cache hit ".When not residing in data cache 120, the memory field is called data " high-speed cache not in ".
As previous pipeline stage 102,104,106,108,110, this actuating logic device 112 uses the operation code (if desired) that is provided to carry out these times computing by micro-order institute standard.If the result of actuating logic device 112 performed these times computings need be written in the Installed System Memory, then these results are provided to stored logic device 114.114 execution of this stored logic device are stored to the result the inferior computing of core position.This stored logic device 114 also is coupled to data cache 120, and carry out to store the requested of computing to the data cache 120 ideally, and result data must not write via rambus 134.The execution result that appointment is stored in architectural registers (architectural register) provides to writing back logic device 116.This writes back logic device 116 result is write in the architectural registers.Generally speaking, the macro instruction that to extract from internal memory is translated into relevant micro-order, what these relevant micro-orders were synchronized with pipeline frequency signal (not icon) then passes through each follow-up logic device operation stages 106,108,110,112,114,116 in regular turn, makes inferior computing of appointment to carry out simultaneously as assembling online operation.
The block scheme of Fig. 1 is described and is applied to illustrate needed necessary assembly when of the present invention in the microprocessor 100, for the sake of clarity, has omitted the many logic devices computing in the microprocessor 100 from accompanying drawing.Those skilled in the art can understand stage and the logic device unit that microprocessor 100 comprises many certain operations, for brevity, are one with wherein some set.For example, this load logic device unit 110 can be embodied as a cache interface stage, and the back is a cache line calibration phase (cache linealignment stage) then.And those skilled in the art can understand that actuating logic device 112 can comprise a plurality of parallel performance elements in the microprocessor 100, for example an integer unit, a floating number unit and a single instruction/multiple data unit.
Be the necessary property of the instruction same tone in the pipeline stage 102,104,106,108,110,112,114,116 of understanding microprocessor 100, the present invention reminds the same tone that must check instruction in both cases: 1) when extraction logic device 102 extracts next instruction to instruction cache 118.2) when 120 declarations one of 114 pairs of data high-speed caches of stored logic device store computing.Under first situation, when 114 pairs of data high-speed caches of stored logic device, 120 declarations, one micro-order, and this micro-order stores computing to the core position relevant with next instruction in order to carry out one, but this data cache 120 is not passed the position of revising in the internal memory back via rambus 134 as yet.In this case, if this extraction logic device 102 from instruction cache 118 or internal memory via Bus Interface Unit 122 acquisition next instructions, when then next instruction can factor be stored to internal memory latent according to high-speed cache 120 declarations and incorrect.Under second situation, instruction has been extracted and provides to any pipeline stage before storage stage 114 102,104,106,108,110,112, when the core position of extremely just having extracted this instruction in a storage micro-order order one storage computing wherein.In this case, if do not do the management of same tone, the instruction meeting of extracting in pipeline process is incorrect.
With regard to first situation; if it is relevant with the instruction that is extracted with the next one that the logic device unit in the microprocessor detects a unsettled storage computing; the extraction (in other words, delaying extraction logic device 102) that can postpone next instruction has usually also been finished execution by follow-up pipeline stage up to other all instruction.Usually use pipeline idle (pipeline idle) indication or signal (not icon) represent that the pipeline in the microprocessor 100 is empty and storage computing that all are unsettled is declared to internal memory.At this moment, it is safe extracting next instruction, has finished because can revise the storage computing of next instruction.With regard to second situation, when the logic device unit in the microprocessor detects the instruction relevant with a unsettled save command destination address (being also referred to as destination address) in pipeline the time, usually will remove from the instruction in all stages of pipeline, this removing and comprise stage of the instruction relevant with the target location, extraction with the delay instruction is idle up to this pipeline, and then begins from the position extraction instruction relevant with the destination address that stores.
Store to cause and to instruct the people having the same aspiration and interest not to lead because of need a large amount of logic device computing (and integrated circuit template corresponding regional) to detect and manage in self-correcting code, as mentioned above.Required logic device operand is directly proportional with the quantity in stage in being present in microprocessor pipeline.For further expression, the rule of the instruction same tone of microprocessor detecting now and management is with reference to Fig. 2.
See also Fig. 2, it is that square frame Figure 200 explanation is used for detecting the not technology of the people having the same aspiration and interest of microprocessor pipeline instruction as shown in Figure 1 now, this not the people having the same aspiration and interest lead because of in storage the computing unsettled or while.Extraction phase logic device computing 202 among this square frame Figure 200, it has once command pointer register 216 and is coupled to unsettled storage target detecting logic device (pending store target detect logic) 208.The storage destination register 226 of this unsettled storage target detecting logic device 208 accesses one quantity, this quantity that stores destination register 226 equates with the quantity of the accessible unsettled storage computing of microprocessor.For example, decide according to the employed memorymodel of framework, microprocessor now can be handled several and store computing to writing back internal memory (WB buffers), also can handle several and store computing, and handle the internal memory (STORE buffers) of several storage computings to other form to compound write memory (WC buffers).Multi-form memorymodel is provided in detail and is applied to now the pairing memory attribute of microprocessor system and surpassed scope of the present invention, but represented enough that now microprocessor can announce to high speed buffer memory/internal memory storing computing, deposit the storage computing of some, and, be the same tone of guaranteeing to instruct in the pipeline, must check the destination address with all unsettled storage computings, identical with the address of determining the instruction that next desire is extracted with destination address in being contained in storage destination register 226.This square frame Figure 200 also describes storage stage logic device 204, and it comprises storing checks logic device 210, and this storage inspection logic device 210 is coupled to a little arithmetic register 218 and a target address register 220.This storage checks that logic device unit also is coupled to some instruction pointer registers 228, and the quantity of instruction pointer register 228 equates with storage stage 204 pipeline stage quantity before.
In the computing, this unsettled storage target detecting logic device 208 is used to detect the destination address whether unsettled storage computing is present in internal memory, and the core position of this destination address instruction of whether extracting with next desire is identical.The virtual address of the instruction that next desire is extracted captures certainly instruction pointer register 216 next time, and compare with being present in all virtual addresses that store in destination registers 226, the content that stores destination register 226 shows unsettled in all microprocessors or stores computing simultaneously.If one has with the storage target of instruction pointer register 216 same virtual address next time and detected, then should declare that a stall signals 222 was to pipeline synchronous logic device 206 by unsettled storage target detecting logic device 208.In response to the declaration of this stall signals 222, (being that pipeline is idle) finished in the extraction that this synchronous logic device 206 is delayed next instruction, activities all in pipeline.When this pipeline was idle, this synchronous logic device 206 was allowed the extraction next instruction.
Because micro-order provides to storage stage 204, the content that the little arithmetic register 218 of logic device 210 acquisitions is checked in this storages to be detecting a storage micro-order, and the content that captures this target address register 220 is with this storage micro-order virtual target address of access.This instruction pointer register 228 comprises the virtual address of corresponding programmed instruction, and this programmed instruction is by all the stage institute's computings before storage stage 204 of this microprocessor.When this storage checks that the content of content that logic device 210 detects an instruction pointer register 228 and this target address register 220 is identical, then represent just computing in pipeline of incorrect instruction.This storage checks that logic device 210 sends a signal to synchronous logic device 206 with this incident via removing bus 224, and according to specific computing, the content of this removing bus 224 also can indicate this incorrect instruction in which pipeline stage.Therefore, this synchronous logic device unit 206 these microprocessors of order are removed all previous stages of pipeline (perhaps, on be pushed into all stages relevant with this incorrect instruction), and begin extraction again instruct when pipeline leaves unused.
The present invention observes the content that stores destination register 226 and instruction pointer register 228 in order to keep and to obtain, and needs a large amount of logic device computings.In each circulation of pipeline sequential, the content of these registers 226,228 must be done and revise and evaluation.In addition, those skilled in the art can understand, need keep and the quantity of the instruction pointer register 228 of evaluation and microprocessor in the quantity of pipeline stage be directly proportional.Those skilled in the art more can understand the increase of the quantity of pipeline stage in the microprocessor pipeline, perhaps are the technology that increases the most effective and widespread of processing power (throughput) of pipeline in the microprocessor now.At last, the present invention notice the logic device computing that is shown among square frame Figure 200 in pipeline microprocessor except with detecting and management store the instruction that causes not the people having the same aspiration and interest not effect.Therefore wish to reduce and be used to detect and manage the storage of microprocessor now and cause and instruct the not quantity of the logic device computing of the people having the same aspiration and interest.Defective in the known technology and restriction are overcome by the present invention, with reference to Fig. 3 to Figure 10.
See also Fig. 3, it is the feature that a block scheme is described a microprocessor 300 of the present invention, is used to guarantee same tone unsettled or pipeline instruction when storing computing simultaneously.As the microprocessor among Fig. 1 100, microprocessor 300 of the present invention has extraction logic device 302, and it is coupled to translation logic device 304.This translation logic device 304 is coupled to deposits logic device 306.This is deposited logic device 306 and is coupled to addressing logic device 308.This addressing logic device 308 is coupled to load logic device 310.This load logic device 310 is coupled to actuating logic device 312.This actuating logic device 312 is coupled to stored logic device 314.This stored logic device 314 is coupled to and writes back logic device 316.This microprocessor 300 has an instruction cache 318, extracts bus 324 via one and is coupled to extraction logic device 302.In addition, this microprocessor 300 has a data cache 320, is coupled to this load logic device 310 via a load bus 326, and is coupled to this stored logic device 314 via a memory bus 328.This instruction cache 318 and this data cache 320 are coupled to a Bus Interface Unit 322 via bus 330 and 332 respectively.This Bus Interface Unit 322 is connected to Installed System Memory (not shown) via a rambus 334.
In addition, this instruction cache 318 comprise one the instruction translation lookaside buffer (instructiontranslation lookaside buffer, ITLB) 336, be coupled to synchronous logic device 340 via bus 338.This data cache 320 have a data translation lookaside buffer (data translationlookaside buffer, DTLB) 344, be coupled to synchronous logic device 340 via bus 342.This synchronous logic device is exported a clear signal 339 and a stall signals 341 to pipeline synchronous logic device (not icon), and this pipeline synchronous logic device is similar with the pipeline synchronous logic device 206 of Fig. 2 substantially.
In computing, this extraction logic device 302 extracts macro instruction and carries out for microprocessor 300 from Installed System Memory.This instruction cache 318 predicts the internal memory of which field will do the access of extracting instruction, and be about to earlier that this field is in store so that this extraction logic device 302 can be done access apace in this instruction cache 318, and need not do access via Bus Interface Unit 322 and rambus 334.The macro instruction of extracting offers translation logic device 304, and it is translated into relevant microinstruction sequence with macro instruction and carries out for the follow-up phase 306,308,310,312,314,316 of microprocessor 300.Each micro-order of being somebody's turn to do in the relevant microinstruction sequence is carried out subsequent operation, the framework computing that is extracted macro instruction institute standard of reaching a correspondence in order to order the logic device computings in the follow-up pipeline stage 306,308,310,312,314,316.Micro-order after translating provides to depositing logic device 306.The operation code that is used for the follow-up computing of follow-up phase 308,310,312,314,316 is done access from the register (not icon) of depositing logic device 306.Be used for register (not icon) access of operation code from depositing logic device 306 of the inferior computing of follow-up phase 308,310,312,314,316.These addressing logic device 308 employed operation codes are from the virtual address of depositing logic device 306 or a free core position that micro-order produces that is provided being provided, so required operation code can be deposited logic device 306 or this virtual address access from this in these special computing.The address that is used for data acquisition time computing that produces provides to load logic device 310, and this load logic device 310 captures operation code via this data cache 320 from internal memory.This data cache 320 predicts the internal memory of which field will do access, be in store and be about to this field earlier, so that load logic device 310 can be apace via data cache 320 access operation codes, and must not do access via Bus Interface Unit 322 and rambus 334.
As previous pipeline stage 302,304,306,308,310, this actuating logic device 312 uses the operation code (if desired) that is provided to carry out these times computing by micro-order institute standard.If the result of actuating logic device 312 performed these times computings need be written in the Installed System Memory, then these results are provided to stored logic device 314.314 execution of this stored logic device are stored to the result the inferior computing of core position.This stored logic device 314 also is coupled to data cache 320, and carry out to store the requested of computing to the data cache 320 best, and does not need result data is write via rambus 334.The execution result that appointment is stored in architectural registers provides to writing back logic device 316.This writes back logic device 316 result is write in the architectural registers.Generally speaking, the macro instruction that to extract from internal memory is translated into relevant micro-order, what these relevant micro-orders were synchronized with pipeline frequency signal (not icon) then passes through each follow-up logic device operation stages 306,308,310,312,314,316 in regular turn, makes inferior computing of appointment to carry out simultaneously as assembling online operation.
The block scheme of Fig. 3 is represented necessary assembly of the present invention, for the sake of clarity, has omitted the many logic devices computing in the microprocessor 300 from accompanying drawing.Those skilled in the art can understand stage and the logic device unit that microprocessor 300 comprises many certain operations, for brevity, are one with wherein some set.For example, in a specific embodiment of the present invention, this load logic device unit 310 can be embodied as a cache interface stage, and the back is a cache line calibration phase then.And those skilled in the art can understand that actuating logic device 312 can comprise a plurality of parallel performance elements in the microprocessor 300, for example combination of an integer unit, a floating number unit and these unit and other specific use unit.
For the people having the same aspiration and interest not of the instruction in the pipeline stage 302,304,306,308,310,312,314,316 of detecting and managing this microprocessor 300 of the present invention, the present invention all provides the computing of logic device in this two translation lookaside buffer 336,344, but is used for indicating whether free access and need not check the people having the same aspiration and interest tonality of pipeline of a memory block.Though it is not all not mentioned at Fig. 1 and Fig. 2, those skilled in the art will appreciate that microprocessor 300 uses the TLBs 336,334 in the high-speed caches 318,320 to come cache entities address and memory attribute (for example, read-only, compound writing) corresponding to using the physical address page or leaf that virtual address is videoed always now.For example, those skilled in the art can find that an inlet is arranged in ITLB 336, the virtual address of the next instruction that this inlet is provided corresponding to extraction logic device 302.If this virtual address " is hit " this ITLB 336, then all address translations table look-up (address translation table lookups) (for example page directory, page table etc.) all need not carry out, this address translation is tabled look-up virtual address translation is become physical address in the microprocessor 300 now.The physical address of this page (next instruction is stayed and deposited) is cached among the ITLB 336 together with its memory attribute.Yet if this ITLB 336 of this virtual address " not ", this address translation is tabled look-up and must be performed, to produce the physical address of next appointment.Whether this physical address is used for being contained in wherein with the decision next instruction as index at instruction cache 318.Similarly, the address after translating and be cached among the DTLB 344 with the memory attribute that loads and the storage computing is relevant.
Provide extra field in of the present invention one each inlet that is characterized as among ITLB 336 and the DTLB 344, whether this field can be by free access corresponding to the part of each inlet in order to indicate in the memory page, and need not check the instruction same tone in this memory page part.In one embodiment, the part of this memory page is 1/4 of one page.For a microprocessor of virtual address that the memory page of 4k position is provided, one embodiment of the invention represent a corresponding memory page the 1k bit position can/cannot free access and need not check the same tone of instruction.
Allow that one selectivity activates in the framework of microprocessor of virtual address, preferred embodiment of the present invention need be carried out TLB and table look-up with the same tone of guaranteeing to instruct, even virtual address does not act on.
For when this extraction logic device 302 extracts next instruction to instruction cache 318, detecting is unsettled or storage computing simultaneously when instruction fetch, this next instruction pointer (that is the virtual address that, comprises the core position of the instruction that next desire extracts) provides ITLB 336 to the instruction cache 318 via bus 324.If this next instruction pointer points in the ITLB336, then this extra field being made evaluation could free access corresponding to the memory page part of next instruction with decision.If this evaluation represents that the part of this memory page can free access, then extract next instruction and need not check the same tone of pipeline middle finger order.If the next instruction pointer disappears, or if this field represents that this memory page part can't free access, then the pointer of next instruction is provided to this synchronous logic device 340 via bus 338.
Providing of response next instruction pointer, this synchronous logic device 340 via bus 342 order DTLB344 check by the inlet of a correspondence among the DTLB 344 this memory page pointed partly in the same tone of instruction.If the field in the DTLB inlet represents that these instruct not the people having the same aspiration and interest (promptly, one stores computing is unsettled or is carried out simultaneously to the memory page that stores next instruction), then this synchronous logic device 340 is declared stall signals 341, therefore make microprocessor delay the extraction of next instruction, finished up to the follow-up phase 304,306,308,310,312,314,316 of pipeline and to have carried out all previous instructions (that is, idle and all storage operation transmit up to pipeline).
The not people having the same aspiration and interest of instructing in the detecting pipeline when storing micro-orders in order to carry out at stored logic device 314, this storage micro-order (it comprises the target virtual address of the storage computing that is used to specify) provides DTLB 344 to the data cache 320 via bus 328.If this target virtual address points to DTLB 344, the extra field of selected DTLB inlet can be done evaluation, could free access corresponding to the part of the memory page of this target virtual address and need not check the instruction same tone of the part of this memory page with decision.If this evaluation represents that the part of this memory page can freely do access, then this storage computing is transferred in the data cache 320 and need not further does inspection.If this target virtual address mistake, or can't freely do access if this field is represented the part of this memory page, then this target virtual address provides to synchronous logic device 340 via bus 342.
Providing of response next instruction pointer, this synchronous logic device unit 340 via bus 338 lead order ITLB 336 check by the inlet of a correspondence among the ITLB 336 this memory page pointed partly in the same tone of instruction.If the field in the ITLB inlet represents that these instruct not the people having the same aspiration and interest (promptly, store the part of transporting the memory page that just is being performed from one or more and capture one or more instruction), then clear signals 339 are declared in this synchronous logic device unit 340, this clear signal 339 these microprocessors of order are removed previous pipeline stage and the extraction that delays next instruction, and idle and all storage operation are announced up to pipeline.
The present invention makes the deviser utilize TLB logic device existing in the pipeline microprocessor, according to the instruction of extracting and storing from this memory page part, to represent the same tone of this part memory page instruction.Utilize to add part memory page entitlement field to existing TLB inlet, whether the part of the correspondence of storage gradient (granularity) expression one memory page that this TLB inlet can the part memory page can freely do access.When one stored by bulletin to a memory page a part of, its corresponding entitlement field was set the part that expression DTLB " has " this memory page for.When instruction when the part of a memory page is extracted, its corresponding entitlement field is set expression ITLB for and " is had " this memory page part.The entitlement field of these synchronous logic device 340 management in ITLB336 and DTLB344.In one embodiment, the proprietorial expression to identical part in the memory page is exclusive.That is, have only one of them can represent to have memory page part corresponding to this virtual address if reside in DTLB 344 and 336, two inlets of ITLB corresponding to the inlet of special virtual address.In another embodiment, two inlets all allow to have the part of a memory page, but whether same tone is that " dirty " decides based on this memory page, and are detailed as following.
See also Fig. 4, a block scheme 400 is described the instruction cache 318 in the microprocessor of Fig. 3 and the mutual relationship of data cache 320.This block scheme 400 is provided by a next instruction pointer 402 that is provided by extraction logic device (not icon).This next instruction pointer 402 of the present invention provides to instruction cache 404.This instruction cache 404 is coupled to synchronous logic device 424 via a DSNOOP bus 422.This synchronous logic device 424 is coupled to data cache 432 via an ISNOOP bus 436.This instruction cache 404 comprises instruction cache management logic device (ICACHEMANAGER) 406, does access via bus 420 pairs of instructions translation lookaside buffer (ITLB) 408.This ITLB 408 comprises aforesaid several inlets 410.In one embodiment of this invention, each ITLB inlet 410 also has the entitlement field 412,414,416,418 of several part memory pages, the part of the memory page that the content representation of these entitlement fields and this TLB inlet 410 are relevant can freely be done access for extracting instruction, need not check the same tone that the pipeline middle finger makes.In this block scheme 400, four four/one page entitlement fields 412,414,416,418 are used to represent TLB inlet 410, and finish the same tone management that stores the initiation instruction with the storage gradient of four/one page.Other embodiment is then by to 410 deletions of each TLB inlet or increase the entitlement field, and can use storage gradient, half page storage gradient, the storage gradient of 8/one page, the storage gradient of 16/one page and the storage gradient of 32/one page of single page or leaf.
This block scheme 400 is also described the little arithmetic field 428 of a micro-order of the present invention and the related objective address field 430 of this micro-order, and these fields are provided to a data cache 432 by stored logic device (not icon).This data cache 404 comprises data cache management logic device (DCACHEMANAGER) 434, does access via 450 pairs one data translation lookaside buffer of bus (DTLB) 438.This DTLB 438 comprises several aforesaid inlets 440.In one embodiment of this invention, each DTLB inlet 438 also has the entitlement field 442,444,446,448 of several part memory pages, the part of the memory page that the content representation of these entitlement fields and this TLB inlet 440 are relevant can freely be done access for extracting instruction, need not check the same tone that the pipeline middle finger makes.In this block scheme 400, four four/one page entitlement fields 442,444,446,448 are used to represent TLB inlet 410, and finish the same tone management that stores the initiation instruction with the storage gradient of four/one page.Other embodiment is then by to 410 deletions of each DTLB inlet or increase the entitlement field, and can use storage gradient, half page storage gradient, the storage gradient of 8/one page, the storage gradient of 16/one page and the storage gradient of 32/one page of single page or leaf.
In computing, all power and positions 412,442 of two correspondences; 414,444; 416,446; 418,448 states at the ITLB of two correspondences and DTLB inlet 410,440 are exclusive.That is, but be not that both can have simultaneously, if not instruction cache 404 promptly is the part that data cache 432 can have a virtual store page or leaf.When providing next instruction pointer register (NIP) 402 to this instruction cache 404 to be used for extracting instruction, this instruction cache management logic device 406 is done access to its corresponding TLB inlet 410 and all power and positions 412,414,416 or 418, and (all power and positions 412,414,416 of the special part memory page of institute's access or 418 functions for the low step address of NIP 402, the expression next instruction is stored in that part of this memory page.) if this NIP 402 " hits " ITLB 408, and if all power and positions 412,414,416 of its part memory page or 418 these instruction caches 404 of expression have the part of this memory page, then this instruction is extracted and need not checks same tone.If this NIP 402 " not ", or if hit but show and have instruction cache entitlement, then this NIP 402 is provided to this synchronous logic device 424 via DSNOOP bus 422.Subsequently, this synchronous logic device 424 provides this NIP 402 to search the entitlement of part memory page to data cache management logic device 434 and in DTLB 438 via ISNOOP bus 436.In data cache 432,434 couples of these TLB of this data cache management logic device inlet 440 and do access corresponding to all power and positions 442,444,446 or 448 of the NIP 402 that is provided.(as described in ITLB 408, all power and positions 412,414,416 or 418 of the special part memory page of institute's access are the function of the low step address of NIP 402, and the expression next instruction is stored in that part of this memory page.) if this NIP 402 hits in the DTLB 438, and if all power and positions 442,444,446 of its part page or leaf or 448 expression data caches 432 have the part of this memory page, then these synchronous logic device 424 units are by declaration stall signals (STALL) 426, make microprocessor postpone the extraction instruction, and wait until that its pipeline leaves unused and the storage computing is finished.In addition, this synchronous logic device 424 makes data cache management logic device 434 discharge the entitlement of part memory page and represent this release by the value that changes corresponding entitlement field 442,444,446 or 448 via bus 436.Similarly, this synchronous logic device 424 makes instruction cache management logic device 406 set up the entitlement of this part memory page via bus 422, and represents this foundation by the value that changes corresponding entitlement field 412,414,416 or 418.In one embodiment of this invention, if a NIP 402 misses ITLB 408 and hits DTLB 438, no matter whether the entitlement of DTLB is represented, copy to the pairing inlet 410 of ITLB 408 via monitor bus 436,422 by this synchronous logic device 424 corresponding to the DTLB inlet 440 of NIP 402, avoid address translation look-up routine (that is storage page table tendency).
When being provided to this data cache 432 corresponding to a destination address 430 that stores little computing 428,434 pairs of TLB inlets 440 corresponding with it of this data cache management logic device and all power and positions 442,444,446 or 448 are done access (all power and positions 412,414,416 of the special part memory page of institute's access or 418 are the function of the low step address of NIP 402, represent that part of this memory page need be stored).If this destination address 430 is hit DTLB 438, and if its all power and positions 442,444,446 or 448 these data caches 432 of expression have the part of this memory page, then this storage is announced and need not be done the same tone inspection.If this destination address 430 misses, or if hit but data cache entitlement is not represented, then this destination address 430 is provided to synchronous logic device 424 via ISNOOP bus 436.Subsequently, this synchronous logic device 424 provides this destination address 430 to instruction cache management logic device 406 via DSNOOP bus 422, searches the entitlement of part memory page in ITLB 408.In this instruction cache 404,406 pairs of TLB inlets 410 of this instruction cache management logic device reach does access corresponding to all power and positions 412,414,416 or 418 of the destination address 430 that is provided.(as described in ITLB 408, all power and positions 412,414,416 or 418 of the special part memory page of institute's access are the function of the low order address bit of ITLB 408, represent that the unsettled storage of that part of this memory page will be announced).If this destination address 430 is hit in the ITLB 408, and if all power and positions 412,414,416 of its part memory page or 418 these instruction caches 404 of expression have the part of this memory page, then this synchronous logic device 424 is by declaration clear signal (FLUSH) 425, make this microprocessor remove this previous instruction, and it is idle and store computing and finish up to pipeline postpone to extract instruction from the pipeline stage of previous instruction.In addition, this synchronous logic device 424 makes instruction cache management logic device 406 discharge the entitlement of part memory page and represent this release by the value that changes corresponding entitlement field 412,414,416 or 418 via bus 422.Similarly, this synchronous logic device 424 makes instruction cache management logic device 406 set up the entitlement of this part memory page via bus 422, and represents this foundation by the value that changes corresponding entitlement field 442,444,446 or 448.In one embodiment of this invention, if a destination address 430 is missed DTLB 438 and is hit ITLB 408, no matter whether the entitlement of ITLB is represented, copy to the pairing inlet 440 of DTLB 438 via monitor bus 436,422 by this synchronous logic device 424 corresponding to the ITLB inlet 410 of destination address 430, avoid address translation look-up routine (that is storage page table tendency).
See also Fig. 5, a block scheme is represented cache management logic device 500 of the present invention.This cache management logic device 500 is described the configuration of instruction cache management logic device 406 or the data cache management logic device 434 of Fig. 4.This cache management logic device unit 500 comprises TLB access logic device 502, this TLB access logic device 502 is coupled to watchdog logic device (snoop logic) 504 via depositing out bus 512, and is coupled to extracting logic device (snarf logic) 506 via depositing bus 514 in.This supervision and extracting logic device 504,506 are coupled to synchronous logic device (not icon) via a monitor bus 516.One virtual address (corresponding to the next instruction pointer or store destination address both one of them) provides to TLB access logic device 502 via bus 510.For the purpose of illustrating, Fig. 5 shows one 32 address bus 510, and the virtual address space that the present invention also can comprise other is as 16,64,128 and 256 s' addressing rule.This bus 510 is also passed through watchdog logic device 504.Upper (the upper bit) of virtual address ADDR 31:12 provides the TLB inlet of fetching a correspondence to a translation lookaside buffer with the buses (ENTRY DATA) that enter the mouth via data.
In the computing, when a virtual address is provided to cache management logic device 500, this TLB access logic device 502 is with the upper TLB that delivers to of ADDR 31:12, and whether an inlet is present in this TLB (that is, hitting) with decision.If then this inlet and all power and positions thereof are copied to the inlet impact damper 508 of access logic device 502.For the purpose of illustrating, this block scheme shows transmission one address bit ADDR 31:12 to TLB, uses expression one 4KB virtual page, also can be applied to other big or small virtual page yet the present invention points out the present invention, comprises 1-KB, 8-KB, 16KB etc.Use the low-order bit (being position 11:10 in this example) of virtual address, entitlement field QP3, QP2, QP1 or the QP0 of 502 pairs one correspondences of this TLB access logic device do evaluation, could free access and need not search entitlement in the TLB of pairing to determine the relevant part of a virtual store page or leaf.If entitlement is not represented, or if this virtual address is all missed, then these watchdog logic device Unit 504 are sent to synchronous logic device (not icon) with virtual address, monitor other TLB via this monitor bus 516.If it goes into the state representation of other TLB inlet oral erotism and be copied to this TLB access logic device, then should grasp logic device unit 506 via this monitor bus 516 by this by inlet, and provide to this TLB access logic device via depositing bus 514 in.If start supervision from this cache management logic device from its mating section via monitor bus, then this extracting logic device unit 506 deposits the virtual address that this provided in bus 514 via this and is sent to TLB access logic device 502.In the answer of this corresponding TLB inlet and when checking, if this TLB of expression goes into the TLB that oral erotism copies to pairing, then this TLB inlet deposits out 512 via bus provides to this monitor bus 516 by watchdog logic device unit 504.The state of the field QP3-QP0 that changes ownership is guided in this TLB access logic device unit 502 by this synchronous logic device unit.
See also Fig. 6, it is a form 600 in order to describe synchronization action, and this synchronization action is in response to the computing of unsettled storage of the present invention, with the same tone of guaranteeing to instruct in the pipeline.Extremely shown in Figure 5 as preceding Fig. 3, if provide to the destination address of DTLB and hit DTLB, and if this DTLB has the part of a memory page, and be unsettled to the storage computing of the part of this memory page, then this storage can be done and need not check the same tone of pipeline middle finger order.Do not have the entitlement of this part memory page if this DTLB hits, then monitor this ITLB.If this ITLB hits, and have the entitlement of this part memory page, then remove this pipeline and delay up to this storages computing and finish and this pipeline is to leave unused.All power and positions among this ITLB are set and are used to represent that this ITLB has this part memory page no longer, and all corresponding power and positions are set in this DTLB of expression and have this part memory page now among this DTLB.Entitlement is not represented if the supervision of this TILB is hit, this DTLB represent the part memory page all finish this storage computing for the time being.If this DTLB misses, then start an ITLB and monitor.When ITLB supervision was hit, its TLB inlet was copied to DTLB.If this ITLB represents the entitlement of this part memory page, then remove this pipeline and delay up to it idle.All power and positions in ITLB are set in this ITLB of expression and have this part memory page no longer, and all corresponding power and positions are set this DTLB of expression for and had this part memory page now among this DTLB.If this ITLB inlet is not represented entitlement, then this DTLB inlet is set the entitlement of expression part memory page for, and should storage computing use physical address and the attribute information that copies to this ITLB finish.
See also Fig. 7, it is a form 700 in order to describing synchronization action, and this synchronization action extracts the same tone of carrying out before the next instruction to guarantee to instruct in the pipeline in the present invention.If the virtual address that is used for next instruction is provided to an ITLB and hits this ITLB, and if this ITLB have storage will be in order to store the part memory page of next instruction, then can finish the extraction of instruction and need not check the same tone of pipeline middle finger order.Do not have the entitlement of this part memory page if this ITLB hits, then this DTLB is monitored.If this DTLB hits and have the entitlement of this part memory page, then this pipeline is delayed up to this storages computing and is finished and this pipeline leaves unused.When this pipeline is idle, then extract next instruction.All power and positions among this DTLB are set this DTLB of expression for and are had this part memory page no longer, and all corresponding among ITLB power and positions represent that then this ITLB has this part memory page.If this DTLB supervision is hit but do not represented entitlement, this ITLB represents the entitlement of part memory page and extracts this instruction.If this ITLB misses, the supervision of a DTLB is activated.When DTLB supervision was hit, its TLB inlet was copied to this ITLB.If this DTLB represents the entitlement of this part memory page, then this pipeline is delayed up to it idle.Idle when this pipeline, then extract next instruction.All power and positions among this DTLB are set this DTLB of expression expression for and are had this part memory page no longer, and all corresponding among ITLB power and positions represent that then this ITLB has this part memory page.If this DTLB inlet is not represented entitlement, then this ITLB inlet set the entitlement of expression part memory page for and the attribute information that uses physical address and copy to this DTLB to finish the extraction next instruction.
See also Fig. 8, a block scheme is represented another embodiment of cache management logic device 800 of the present invention.Another embodiment of this cache management logic device 800 comprise compute mode substantially with the cache management logic device 500 of Fig. 5 in several component class like assembly.On the contrary, for the cache management logic device 500 among Fig. 5, the state of the part memory page field QP3-QP0 of this another embodiment correspondence of this cache management logic device unit 800 need not only select one between instruction cache and data cache.As implied above, the entitlement with respect to the part of this memory page of the virtual address that is provided all can be provided for instruction cache and data cache.This makes reading of data cache not influence the extraction of instruction.Read as if the data high-speed cache is done, the entitlement of this corresponding memory page is set up in this DTLB access logic device unit 802, but synchronous logic device unit does not need this instruction cache to discharge entitlement.Therefore, if the part generation instruction fetch of this corresponding memory page, the extraction of this instruction can be finished and need not monitor DTLB.Since this instruction and data cache all can have the entitlement of a special part of a memory page, the present invention needs the state of dirty position (the dirty bit) 803 in the DTLB access logic device 802 decision access TLB inlet fields 801, in the DTLB from instruction cache monitors, if one to store computing etc. pending or be sent to the part of this memory page, or if only read the part that computing has been transferred into this memory page, then proceed instruction fetch and need not delay this pipeline.Those skilled in the art understand this dirty position 803 for MESI cache line state indicator one of them, the content that generally is used to refer to the high-speed cache inlet has been modified, but is not sent to internal memory as yet.Though dirty position 803 is expressed as the some of TLB inlet 801, it is accurate to those skilled in the art will know that this dirty position also specifies in the position that stores in the high-speed cache, traditionally in cache line position standard.Fig. 9, Figure 10 discuss the synchronous of pipeline in more detail according to another embodiment of cache management logic device 800.
See also Fig. 9, it is a form 900 in order to describing detailed synchronization action, and this synchronization action is in response to the unsettled storage computing according to another embodiment 800 of Fig. 8, with the same tone of guaranteeing to instruct in the pipeline.According to this another embodiment 800, if one provides to the destination address of DTLB and hits DTLB, and if this DTLB has the part of this pending memory page such as this storage computing, and, then finish this storage computing and need not check instruction same tone in the pipeline if this memory page need revise (or this special cache line relevant with this destination address need be revised).If destination address is hit this DTLB, and if this DTLB has the entitlement that unsettled execution stores this memory page part of computing, and if this memory page (or corresponding cache line) need not be revised, then this ITLB must be monitored.Monitor at an ILTB and to hit entitlement that the extraction that this pipeline must be eliminated and instruct is delayed, and idle and this storage computing transmits up to this pipeline.In addition, all power and positions among the ITLB are set expression ITLB for and are had this memory page no longer.When ITLB monitored miss entitlement, this storage computing was transmitted.Do not have the entitlement of this part memory page if this DTLB hits, then this ITLB is monitored.If ITLB hits and have the entitlement of this memory page, then this pipeline is eliminated and delays up to this storages computing and finish and this pipeline is left unused.In addition, all power and positions among the ITLB set that this ITLB of expression has this part memory page no longer for all for the time being among the DTLB all corresponding power and positions set expression DTLB for and have this part page or leaf.Do not represent entitlement if the supervision of this ITLB is hit, then this DTLB represent this part memory page all finish storage for the time being.If this DTLB misses, then start ITLB and monitor.When the ITLB supervision was hit, its ITLB inlet copied to DTLB.If this ITLB represents the entitlement of this part memory page, then this pipeline is eliminated and delays up to idle, comprises and transmits this storage.In addition, all power and positions among the ITLB set that this ITLB of expression has this part memory page no longer for all for the time being among the DTLB all corresponding power and positions set expression DTLB for and have this part page or leaf.If this ITLB inlet is not represented entitlement, then this DTLB inlet is set the entitlement of this part memory page of expression for, and finishes the attribute information that this storage is used this physical address and copied to ITLB.
Figure 10 is a form, in order to describe synchronization action, carries out before the next instruction of extraction according to this another embodiment 800 of Fig. 8, with the same tone of guaranteeing to instruct in the pipeline.If the virtual address (providing to an ITLB) of this next instruction is hit this ITLB, and be stored in wherein part memory page, then finish the extraction of this instruction and need not check the same tone of pipeline middle finger order if this ITLB has next instruction.Do not have the entitlement of this part memory page if this ITLB hits, then this DTLB is monitored.The memory page that if the supervision of this DTLB is hit and should correspondence (or cache line, be provided in the MESI state if cache line stores gradient) need to revise, then this pipeline is delayed up to finishing this storage and this pipeline for idle.When pipeline is idle, extract next instruction.Corresponding memory page (or cache line) need not be revised if the supervision of this DTLB is hit, and then this ITLB represents the entitlement of part memory page and extracts this instruction.If this ITLB misses, then start a DTLB and monitor.When a DTLB hit, its TLB inlet copied to this ITLB.If the supervision of data high-speed cache is represented that the part (that is, the part of memory page or cache line) of the memory page that this is corresponding needs to revise, then this pipeline is delayed up to it idle.When this pipeline is idle, then extract next instruction.Need not revise if this part memory page (or cache line) is represented in the supervision of this data cache, then next instruction is extracted the attribute information that uses this physical address and copy to DTLB.
Though the present invention and purpose thereof, feature and advantage at length disclose, other embodiment also is contained among the present invention.For example, the present invention does discussion in order to illustrate conveniently with the instruction cache and the data cache that separate, but inventor of the present invention represents that the present invention also can comprise compound cache structure.TLB inlet can be shared and additional field can be represented entitlement in a command path or the data routing.
In addition, the present invention does explanation with the particular stage (extract, translate etc.) of a pipeline.These titles with gather in herein in order to clearly revealing but not the structure of hint pipeline.The present invention can be used for having the line construction now of any amount, comprises the subclass with disorderly micro-order.
Above explanation under the train of thought of specific embodiment and necessary condition thereof and provide, can make general those skilled in the art can utilize the present invention.Yet, the various modifications that this preferred embodiment is done, apparent to those skilled in the art, and, in this General Principle of discussing, also can be applied to other embodiment.Therefore, the present invention is not limited to this place and puts on display specific embodiment with narration, but has the maximum magnitude that the principle that place therewith discloses conforms to novel feature.Those skilled in the art can with notion that is disclosed and certain embodiments be the basis design or improve other structure to realize the purpose identical with the present invention.

Claims (27)

1, the device in a kind of pipeline microprocessor is used for guaranteeing the same tone that stage of this pipeline microprocessor instructs, and this device comprises:
One instruction cache management logic device, configuration becomes to receive the address corresponding to next instruction, and detecting could freely be done access corresponding to the some of a memory page of this next instruction, and need not check the same tone that instructs in this part of this memory page, and, provide this address if must check same tone the time; And
The synchronous logic device, configuration becomes to receive this address from this instruction cache management logic device, and order a data cache management logic device to check the same tone that instructs in this part of this memory page, and if this instruction and this part of this memory page people having the same aspiration and interest not, the then extraction of this pipeline microprocessor delay next instruction of this synchronous logic device cell command is up to all previous instructions of this stage executed of this pipeline microprocessor.
2, device according to claim 1, wherein whether this instruction cache management logic device can't be by free access to detect this memory page part to the inlet evaluation corresponding to the instruction translation lookaside buffer of this address.
3, device according to claim 2, the inlet of wherein above-mentioned instruction translation lookaside buffer be to should memory page, and comprise several all power and positions of part memory page.
4, device according to claim 3, this one of them this part of several all power and positions of part memory page wherein corresponding to this memory page, all the other these several all power and positions of part memory page are corresponding to the remainder of this memory page.
5, device according to claim 4, wherein the access of this part of this memory page comprises following possible mode at least: this part of this memory page can be by free access, if these these all power and positions of part memory page corresponding to this part of this memory page are set; This part of this memory page can't free access, if should not be set corresponding to these all power and positions of part memory page of this part of this memory page.
6, device according to claim 1, wherein above-mentioned data cache management logic device be to the inlet evaluation corresponding to the data translation lookaside buffer of above-mentioned address, with detect this instruction whether with this part people having the same aspiration and interest of this memory page.
7, device according to claim 6, the inlet of wherein above-mentioned data translation lookaside buffer be to should memory page, and comprise several all power and positions of part memory page.
8, device according to claim 7, this one of them this part of several all power and positions of part memory page wherein corresponding to this memory page, all the other these several all power and positions of part memory page are corresponding to the remainder of this memory page.
9, device according to claim 8, wherein the relation of this instruction and this memory page comprises following possible mode at least: this part of this instruction and this memory page is the people having the same aspiration and interest not, if these these all power and positions of part memory page corresponding to this part of this memory page are set; This part people having the same aspiration and interest of this instruction and this memory page is not if these these all power and positions of part memory page corresponding to this part of this memory page are set.
10, the device in a kind of pipeline microprocessor is used for guaranteeing the same tone that stage of this pipeline microprocessor instructs, and this device comprises:
One data cache management logic device unit, configuration become to receive an address of corresponding one unsettled save command, and detecting could freely be done access corresponding to the some of a memory page of this save command, and need not check the same tone that instructs in this part of this memory page, and, provide this address if must check same tone the time; And
The synchronous logic device, configuration becomes to receive this address from this data cache management logic device, and order an instruction cache management logic device to check the same tone that instructs in this part of this memory page, and if this instruction and this part of this memory page people having the same aspiration and interest not, then this this pipeline microprocessor of synchronous logic device order is removed the previous stage of this pipeline microprocessor.
11, device according to claim 10, wherein above-mentioned data cache management logic device to corresponding to this address one the instruction translation lookaside buffer the inlet evaluation, whether can't do free access with this part of detecting this memory page.
12, device according to claim 11, inlet that wherein should the instruction translation lookaside buffer be to should memory page, and comprise several all power and positions of part memory page.
13, device according to claim 12, this one of them this part of several all power and positions of part memory page wherein corresponding to this memory page, all the other these several all power and positions of part memory page are corresponding to the remainder of this memory page.
14, device according to claim 13, wherein the access of this part of this memory page comprises following possible mode at least: this part of this memory page can be by free access, if these these all power and positions of part memory page corresponding to this part of this memory page are set; This part of memory page can't free access, if should not be set corresponding to these all power and positions of part memory page of this part of this memory page.
15, device according to claim 10, wherein this instruction cache management logic device is to the inlet evaluation corresponding to a data translation lookaside buffer of this address, with detect this instruction whether with this part people having the same aspiration and interest of this memory page.
16, device according to claim 15, the inlet of wherein above-mentioned data translation lookaside buffer be to should memory page, and comprise several all power and positions of part memory page.
17, device according to claim 16, this one of them this part of several all power and positions of part memory page wherein corresponding to this memory page, all the other these several all power and positions of part memory page are corresponding to the remainder of this memory page.
18, device according to claim 17 wherein should instruction and this part of this memory page people having the same aspiration and interest not, if should be set corresponding to these all power and positions of part memory page of this part of this memory page.
19, a kind of method of pipeline microprocessor is used for guaranteeing the same tone that stage of this pipeline microprocessor instructs, and this method comprises the following steps:
In a data cache, detecting can freely be done access corresponding to the part of a memory page of a unsettled save command, and need not check the same tone that instructs in this part of this memory page;
Order the logic device in the instruction cache to check the same tone that instructs in this part of this memory page;
If this instructs the not people having the same aspiration and interest, then remove the previous stage of this pipeline microprocessor.
20, method according to claim 19, wherein this detecting step comprises:
One data translation lookaside buffer inlet is done evaluation, and this inlet is corresponding to a destination address that is used for this unsettled save command.
21, method according to claim 20, wherein this data translation lookaside buffer inlet is to should memory page, and comprises several all power and positions of part memory page.
22, method according to claim 21, this one of them this part of several all power and positions of part memory page wherein corresponding to this memory page, all the other these several all power and positions of part memory page are corresponding to the remainder of this memory page.
23, method according to claim 22, wherein the access of this part of this memory page comprises following possible mode at least: this part of this memory page can be by free access, if these these all power and positions of part memory page corresponding to this part of this memory page are set; This part of this memory page can't free access, if should not be set corresponding to these all power and positions of part memory page of this part of this memory page.
24, method according to claim 19, wherein this commands steps comprises:
One instruction translation lookaside buffer inlet is done evaluation, and this inlet is corresponding to a destination address that is used for this unsettled save command.
25, method according to claim 24 wherein should the instruction translation lookaside buffer enter the mouth corresponding to this memory page, and comprises several all power and positions of part memory page.
26, method according to claim 25, this one of them this part of several all power and positions of part memory page wherein corresponding to this memory page, all the other several all power and positions of part memory page are corresponding to the remainder of this memory page.
27, method according to claim 26 wherein should instruction and this part of this memory page people having the same aspiration and interest not, if should be set corresponding to these all power and positions of part memory page of this part of this memory page.
CNA2004100560993A 2003-09-19 2004-08-16 Device and method for store-induced instruction coherency Pending CN1560736A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/665,171 US7263585B2 (en) 2002-09-19 2003-09-19 Store-induced instruction coherency mechanism
US10/665,171 2003-09-19

Publications (1)

Publication Number Publication Date
CN1560736A true CN1560736A (en) 2005-01-05

Family

ID=34194761

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2004100560993A Pending CN1560736A (en) 2003-09-19 2004-08-16 Device and method for store-induced instruction coherency

Country Status (4)

Country Link
US (1) US7263585B2 (en)
EP (1) EP1517232B1 (en)
CN (1) CN1560736A (en)
TW (1) TWI262438B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034469B (en) * 2005-10-26 2010-05-12 威盛电子股份有限公司 GPU pipeline synchronization and control system and method
CN101395674B (en) * 2006-03-03 2013-07-24 高通股份有限公司 Method and apparatus for testing data steering logic for data storage
WO2014139466A3 (en) * 2013-03-15 2015-08-27 Shanghai Xinhao Microelectronics Co. Ltd. Data cache system and method
CN105975405A (en) * 2015-05-21 2016-09-28 上海兆芯集成电路有限公司 Processor and method for making processor operate

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8621179B2 (en) * 2004-06-18 2013-12-31 Intel Corporation Method and system for partial evaluation of virtual address translations in a simulator
GB0513375D0 (en) * 2005-06-30 2005-08-03 Retento Ltd Computer security
US7360022B2 (en) * 2005-12-29 2008-04-15 Intel Corporation Synchronizing an instruction cache and a data cache on demand
US8683143B2 (en) * 2005-12-30 2014-03-25 Intel Corporation Unbounded transactional memory systems
US8180977B2 (en) * 2006-03-30 2012-05-15 Intel Corporation Transactional memory in out-of-order processors
US8180967B2 (en) * 2006-03-30 2012-05-15 Intel Corporation Transactional memory virtualization
US8479174B2 (en) * 2006-04-05 2013-07-02 Prevx Limited Method, computer program and computer for analyzing an executable computer file
US7945761B2 (en) * 2006-11-21 2011-05-17 Vmware, Inc. Maintaining validity of cached address mappings
US7937561B2 (en) * 2008-04-03 2011-05-03 Via Technologies, Inc. Merge microinstruction for minimizing source dependencies in out-of-order execution microprocessor with variable data size macroarchitecture
US9413721B2 (en) 2011-02-15 2016-08-09 Webroot Inc. Methods and apparatus for dealing with malware
US9804969B2 (en) * 2012-12-20 2017-10-31 Qualcomm Incorporated Speculative addressing using a virtual address-to-physical address page crossing buffer
US9747212B2 (en) * 2013-03-15 2017-08-29 International Business Machines Corporation Virtual unifed instruction and data caches including storing program instructions and memory address in CAM indicated by store instruction containing bit directly indicating self modifying code
US9824012B2 (en) * 2015-09-24 2017-11-21 Qualcomm Incorporated Providing coherent merging of committed store queue entries in unordered store queues of block-based computer processors
KR102376396B1 (en) * 2016-12-07 2022-03-21 한국전자통신연구원 Multi-core processor and cache management method thereof
US10740167B2 (en) * 2016-12-07 2020-08-11 Electronics And Telecommunications Research Institute Multi-core processor and cache management method thereof
KR102377729B1 (en) * 2016-12-08 2022-03-24 한국전자통신연구원 Multi-core processor and operation method thereof
US10706150B2 (en) * 2017-12-13 2020-07-07 Paypal, Inc. Detecting malicious software by inspecting table look-aside buffers

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5835949A (en) * 1994-12-27 1998-11-10 National Semiconductor Corporation Method of identifying and self-modifying code
US5742791A (en) 1996-02-14 1998-04-21 Advanced Micro Devices, Inc. Apparatus for detecting updates to instructions which are within an instruction processing pipeline of a microprocessor
US5930821A (en) * 1997-05-12 1999-07-27 Integrated Device Technology, Inc. Method and apparatus for shared cache lines in split data/code caches
US6164840A (en) * 1997-06-24 2000-12-26 Sun Microsystems, Inc. Ensuring consistency of an instruction cache with a store cache check and an execution blocking flush instruction in an instruction queue
US6460119B1 (en) * 1997-12-29 2002-10-01 Intel Corporation Snoop blocking for cache coherency
US6405307B1 (en) * 1998-06-02 2002-06-11 Intel Corporation Apparatus and method for detecting and handling self-modifying code conflicts in an instruction fetch pipeline
US6594734B1 (en) * 1999-12-20 2003-07-15 Intel Corporation Method and apparatus for self modifying code detection using a translation lookaside buffer
US6848032B2 (en) * 2002-09-27 2005-01-25 Apple Computer, Inc. Pipelining cache-coherence operations in a shared-memory multiprocessing system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034469B (en) * 2005-10-26 2010-05-12 威盛电子股份有限公司 GPU pipeline synchronization and control system and method
CN101395674B (en) * 2006-03-03 2013-07-24 高通股份有限公司 Method and apparatus for testing data steering logic for data storage
WO2014139466A3 (en) * 2013-03-15 2015-08-27 Shanghai Xinhao Microelectronics Co. Ltd. Data cache system and method
US9785443B2 (en) 2013-03-15 2017-10-10 Shanghai Xinhao Microelectronics Co. Ltd. Data cache system and method
CN105975405A (en) * 2015-05-21 2016-09-28 上海兆芯集成电路有限公司 Processor and method for making processor operate

Also Published As

Publication number Publication date
US7263585B2 (en) 2007-08-28
EP1517232A2 (en) 2005-03-23
TW200512650A (en) 2005-04-01
EP1517232B1 (en) 2016-03-23
EP1517232A3 (en) 2007-11-14
US20040068618A1 (en) 2004-04-08
TWI262438B (en) 2006-09-21

Similar Documents

Publication Publication Date Title
CN1560736A (en) Device and method for store-induced instruction coherency
US9116817B2 (en) Pointer chasing prediction
US6412057B1 (en) Microprocessor with virtual-to-physical address translation using flags
JP5108002B2 (en) Virtually tagged instruction cache using physical tagging operations
JP5506049B2 (en) Transition from source instruction set architecture (ISA) code to translated code in a partial emulation environment
CN1315060C (en) Tranfer translation sideviewing buffer for storing memory type data
US8364933B2 (en) Software assisted translation lookaside buffer search mechanism
EP2542973B1 (en) Gpu support for garbage collection
US20150121046A1 (en) Ordering and bandwidth improvements for load and store unit and data cache
US9009445B2 (en) Memory management unit speculative hardware table walk scheme
US8190652B2 (en) Achieving coherence between dynamically optimized code and original code
US9131899B2 (en) Efficient handling of misaligned loads and stores
CN1690952A (en) Apparatus and method for selecting instructions for execution based on bank prediction of a multi-bank cache
US9632776B2 (en) Preload instruction control
US20090006803A1 (en) L2 Cache/Nest Address Translation
KR20120070584A (en) Store aware prefetching for a data stream
KR102268601B1 (en) Processor for data forwarding, operation method thereof and system including the same
EP3454219A1 (en) An apparatus and method for efficient utilisation of an address translation cache
US20090006812A1 (en) Method and Apparatus for Accessing a Cache With an Effective Address
US9104593B2 (en) Filtering requests for a translation lookaside buffer
US9009413B2 (en) Method and apparatus to implement lazy flush in a virtually tagged cache memory
Whitham et al. Implementing time-predictable load and store operations
US8688961B2 (en) Managing migration of a prefetch stream from one processor core to another processor core
US20100100702A1 (en) Arithmetic processing apparatus, TLB control method, and information processing apparatus
US6820254B2 (en) Method and system for optimizing code using an optimizing coprocessor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication